← Back to articles
March 12, 2026
5 min read

Plateau Analysis: How to Distinguish a Robust Optimum from Overfitting

Plateau Analysis: How to Distinguish a Robust Optimum from Overfitting
#algotrading
#backtest
#optimization
#overfitting
#plateau analysis
#parameter stability

Article 6 in the "Backtests Without Illusions" series

You ran study.optimize(), Optuna found a parameter set with PnL +87%. You're excited and preparing the strategy for production. Two weeks of live trading later, PnL is around zero. What happened?

The optimizer found the tip of a needle in parameter space. The parameters are perfectly fitted to the historical sequence of trades — but the slightest deviation in market conditions destroys the entire construct. This is classic overfitting, and it could have been detected before launch.

In the previous article we compared coordinate descent with Bayesian optimization and showed why Optuna finds the optimum more efficiently. Today — the next step: how to make sure the found optimum is robust, rather than the result of fitting to noise.

Why Finding the "Best" Parameters Is Only Half the Work

Searching through multidimensional parameter space An optimizer navigating a vast multidimensional parameter landscape in search of the true optimum

Strategy parameter optimization is a search for a maximum in a multidimensional space. The problem is that maximums come in two types:

  1. Plateau — a wide flat region where PnL is consistently high across parameter variations. Even if market conditions shift the effective parameters by 10-20%, the strategy will continue to profit.

  2. Sharp peak — a narrow summit where PnL is high only at the exact parameter value. A shift of one step collapses profitability. This is almost certainly overfitting: the optimizer found an artifact of historical data, not a stable pattern.

An alpinism metaphor: a plateau is a mountain tableland where you can walk safely. A sharp peak is the tip of a needle where you can only balance.

Sharp Peak vs Flat Plateau — Visual Intuition

Sharp peak versus flat plateau comparison Left: a robust plateau (wide table mountain with gentle slopes). Right: a fragile sharp peak (needle tip surrounded by deep valleys)

Imagine a contour map where the axes are two strategy parameters and the color represents PnL. Two patterns are easy to distinguish visually:

Plateau (robust optimum):

  • Wide areas of the same color
  • Smooth transitions between PnL levels
  • Isolines far apart
  • Shifting from the optimum by +/-20% changes PnL by no more than 10%

Imagine a heatmap: in the center — a bright yellow rectangle roughly one-third the size of the entire map. The color gradually transitions to orange, then red toward the edges. The optimum is not a point, but a region.

Sharp peak (overfitting):

  • A narrow bright spot surrounded by cold colors
  • Abrupt transitions: a collapse right next to the optimum
  • Isolines compressed into tight rings
  • Shifting by +/-5% drops PnL by 50% or more

Imagine the same heatmap, but in the center — a tiny yellow dot immediately surrounded by blue and purple. A single "correct" parameter combination.

Parameter Sensitivity Analysis

Parameter sensitivity slice plots Slice plots showing how PnL depends on individual parameter values — wide bands indicate robustness, narrow clusters indicate fragility

One-Dimensional Analysis: PnL vs One Parameter

The simplest approach — fix all parameters except one and see how PnL depends on its value. Optuna provides plot_slice for this:

import optuna
from optuna.visualization import plot_slice

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=500)

fig = plot_slice(study, params=["htf_entry_sell", "ltf_momentum", "stop_loss_pct"])
fig.show()

What to look for on a slice plot:

  • Robust parameter: the point cloud forms a wide horizontal band near the optimum. The best trials are spread across a wide range of parameter values.
  • Fragile parameter: the best trials are concentrated in a narrow range. Shifting the parameter by one or two steps — and profitability collapses.

Two-Dimensional Analysis: Contour Plots (Heatmaps)

A contour plot shows the interaction of two parameters simultaneously. This is the key tool for plateau analysis, because parameters rarely act independently — entry and exit thresholds, timeframes, and position sizes are interconnected.

from optuna.visualization import plot_contour

fig = plot_contour(study, params=["htf_entry_sell", "htf_exit_buy"])
fig.show()

A contour plot for a robust parameter pair looks like a topographic map of a hilly plain: smooth wide isolines, large areas of the same color. A contour plot for a fragile pair — like a map of a volcanic cone: tight concentric rings around a single point.

For a strategy with 12 separation parameters, this gives (122)=66\binom{12}{2} = 66 pairwise contour plots. You don't have to study them all — start with the parameters that Optuna rated as most important.

Multidimensional Analysis: Parameter Importance Ranking

Optuna can estimate each parameter's contribution to the objective function:

from optuna.visualization import plot_param_importances

fig = plot_param_importances(study)
fig.show()

The parameter importance chart is a horizontal histogram. Parameters are ranked by their contribution to PnL variance in descending order. The top 3-4 parameters usually explain 70-80% of the variance.

Rule: if a parameter explains less than 2% of PnL variance, its value is practically irrelevant to the result — it's robust by definition. Focus plateau analysis on the top-5 most important parameters.

Optuna Visualization Tools

Optuna contour plots and parameter importance visualization Contour heatmaps showing parameter interaction landscape alongside importance rankings

plot_slice — One-Dimensional Slices

import optuna
from optuna.visualization import plot_slice

fig = plot_slice(study, params=[
    "htf_entry_sell", "htf_entry_buy",
    "ltf_momentum_threshold", "stop_loss_pct",
    "take_profit_pct", "trailing_stop_pct"
])
fig.update_layout(height=800, title="Parameter Slice Plots")
fig.show()

The result — a grid of scatter plots. Each subplot shows the objective function value (PnL, Y-axis) against a single parameter value (X-axis). Points are individual trials. For a robust parameter, the best points (highest PnL) are distributed across a wide range of X. For a fragile one — grouped in a narrow column.

plot_contour — Two-Dimensional Contours

from optuna.visualization import plot_contour

important_pairs = [
    ["htf_entry_sell", "htf_entry_buy"],
    ["htf_entry_sell", "stop_loss_pct"],
    ["ltf_momentum_threshold", "take_profit_pct"],
]

for params in important_pairs:
    fig = plot_contour(study, params=params)
    fig.update_layout(title=f"Contour: {params[0]} vs {params[1]}")
    fig.show()

Each contour plot is a heatmap with two parameters on the axes. Color encodes the average PnL in a given region of parameter space. Yellow/green — high PnL, blue/purple — low. Isolines connect points with the same PnL.

plot_param_importances — Parameter Contributions

from optuna.visualization import plot_param_importances

fig = plot_param_importances(
    study,
    evaluator=optuna.importance.FanovaImportanceEvaluator()
)
fig.show()

fANOVA (functional ANOVA) decomposes the variance of the objective function across parameters and their interactions. This is more powerful than simple correlation because it accounts for nonlinear effects.

Quantitative Plateau Metrics

Quantitative robustness metrics visualization Sensitivity ratio, plateau width, and robustness score — three metrics that formalize plateau quality

Visual assessment is subjective. We need numbers. Here are three metrics that formalize the concept of a "plateau."

Sensitivity Ratio

The ratio of PnL change to parameter change:

Si=ΔPnL/PnLoptΔpi/pi,optS_i = \frac{\Delta \text{PnL} / \text{PnL}_{opt}}{\Delta p_i / p_{i,opt}}

where ΔPnL\Delta \text{PnL} is the PnL drop when parameter pip_i deviates from the optimum by Δpi\Delta p_i.

Interpretation:

  • Si<0.5S_i < 0.5 — parameter is robust: a 10% shift causes less than 5% PnL drop
  • 0.5Si<2.00.5 \leq S_i < 2.0 — moderate sensitivity
  • Si2.0S_i \geq 2.0 — parameter is fragile: a 10% shift crashes PnL by 20%+

Plateau Width

The width of the parameter region within which PnL stays within X%X\% of the optimum:

Wi(X)=pi,maxpi,minsubject toPnL(pi)(1X/100)×PnLoptW_i(X) = p_{i,max} - p_{i,min} \quad \text{subject to} \quad \text{PnL}(p_i) \geq (1 - X/100) \times \text{PnL}_{opt}

Relative plateau width:

Wirel(X)=Wi(X)pi,maxrangepi,minrangeW_i^{rel}(X) = \frac{W_i(X)}{p_{i,max}^{range} - p_{i,min}^{range}}

where the denominator is the full search range of the parameter.

Interpretation:

  • Wirel(10%)>0.3W_i^{rel}(10\%) > 0.3 — the plateau covers more than 30% of the range at the 10% threshold. Robust parameter.
  • Wirel(10%)<0.05W_i^{rel}(10\%) < 0.05 — the plateau is narrower than 5% of the range. Red flag.

Robustness Score

A combined metric across all parameters:

R=i=1k(Wirel(10%))wiR = \prod_{i=1}^{k} \left( W_i^{rel}(10\%) \right)^{w_i}

where wiw_i is the normalized importance of parameter ii from fANOVA (wi=1\sum w_i = 1).

The product of weighted widths is a strict metric: if even one important parameter has a narrow plateau, RR will be low. Unimportant parameters (with small wiw_i) have almost no effect.

Interpretation:

  • R>0.1R > 0.1 — the strategy is robust
  • 0.01<R0.10.01 < R \leq 0.1 — additional validation required (walk-forward)
  • R0.01R \leq 0.01 — overfitting is very likely

Python Code for Automated Plateau Detection

Automated plateau detection pipeline Automated system scanning the parameter landscape to identify robust plateaus and fragile peaks

import numpy as np
import optuna
from optuna.importance import FanovaImportanceEvaluator
from typing import Dict, List, Tuple

def compute_sensitivity_ratio(
    study: optuna.Study,
    param_name: str,
    n_steps: int = 20,
) -> float:
    """
    Compute sensitivity ratio for a single parameter.

    Fixes all parameters at their best values, varies param_name,
    estimates PnL drop through trial interpolation.
    """
    best_trial = study.best_trial
    best_value = best_trial.values[0]
    best_param = best_trial.params[param_name]

    all_trials = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]
    all_trials.sort(key=lambda t: t.values[0], reverse=True)
    top_trials = all_trials[:max(10, len(all_trials) // 5)]

    param_values = np.array([t.params[param_name] for t in top_trials])
    pnl_values = np.array([t.values[0] for t in top_trials])

    if best_param == 0 or best_value == 0:
        return float('inf')

    from numpy.polynomial import polynomial as P
    coeffs = np.polyfit(param_values, pnl_values, deg=2)
    dpnl_dparam = 2 * coeffs[0] * best_param + coeffs[1]

    sensitivity = abs(dpnl_dparam * best_param / best_value)
    return sensitivity


def compute_plateau_width(
    study: optuna.Study,
    param_name: str,
    threshold_pct: float = 10.0,
) -> Tuple[float, float]:
    """
    Compute absolute and relative plateau width.

    Returns:
        (absolute_width, relative_width)
    """
    best_value = study.best_value
    threshold = best_value * (1 - threshold_pct / 100)

    trials = [t for t in study.trials if t.state == optuna.trial.TrialState.COMPLETE]
    good_trials = [t for t in trials if t.values[0] >= threshold]

    if not good_trials:
        return 0.0, 0.0

    good_params = [t.params[param_name] for t in good_trials]
    all_params = [t.params[param_name] for t in trials]

    plateau_min = min(good_params)
    plateau_max = max(good_params)
    absolute_width = plateau_max - plateau_min

    search_range = max(all_params) - min(all_params)
    relative_width = absolute_width / search_range if search_range > 0 else 0

    return absolute_width, relative_width


def compute_robustness_score(
    study: optuna.Study,
    threshold_pct: float = 10.0,
) -> Dict:
    """
    Compute combined robustness score.

    Returns:
        dict with per-parameter metrics and the final score
    """
    evaluator = FanovaImportanceEvaluator()
    importances = optuna.importance.get_param_importances(
        study, evaluator=evaluator
    )

    results = {}
    total_importance = sum(importances.values())

    for param_name, importance in importances.items():
        sensitivity = compute_sensitivity_ratio(study, param_name)
        abs_width, rel_width = compute_plateau_width(
            study, param_name, threshold_pct
        )

        weight = importance / total_importance
        results[param_name] = {
            "importance": importance,
            "weight": weight,
            "sensitivity_ratio": sensitivity,
            "plateau_width_abs": abs_width,
            "plateau_width_rel": rel_width,
        }

    log_score = sum(
        r["weight"] * np.log(max(r["plateau_width_rel"], 1e-10))
        for r in results.values()
    )
    robustness_score = np.exp(log_score)

    return {
        "robustness_score": robustness_score,
        "parameters": results,
        "verdict": (
            "robust" if robustness_score > 0.1
            else "check" if robustness_score > 0.01
            else "overfitting"
        ),
    }

Usage

report = compute_robustness_score(study, threshold_pct=10.0)

print(f"Robustness score: {report['robustness_score']:.4f}")
print(f"Verdict: {report['verdict']}")
print()

for name, metrics in report["parameters"].items():
    print(f"  {name}:")
    print(f"    Importance:       {metrics['importance']:.3f}")
    print(f"    Sensitivity:      {metrics['sensitivity_ratio']:.2f}")
    print(f"    Plateau width:    {metrics['plateau_width_rel']:.1%}")
    print()

Example output:

Robustness score: 0.1482
Verdict: robust

  htf_entry_sell:
    Importance:       0.312
    Sensitivity:      0.38
    Plateau width:    42.5%

  htf_entry_buy:
    Importance:       0.251
    Sensitivity:      0.45
    Plateau width:    38.1%

  ltf_momentum_threshold:
    Importance:       0.187
    Sensitivity:      1.21
    Plateau width:    22.3%

  stop_loss_pct:
    Importance:       0.098
    Sensitivity:      0.67
    Plateau width:    31.0%

  take_profit_pct:
    Importance:       0.072
    Sensitivity:      0.89
    Plateau width:    28.4%

  trailing_delta:
    Importance:       0.031
    Sensitivity:      0.22
    Plateau width:    55.2%

Practical Examples with Separation Strategies

Three strategies compared by robustness profile Comparing Strategy A (wide plateau, robust), Strategy B (moderate), and Strategy C (sharp peak, overfitted)

Let's examine three strategies with 12 separation parameters. Each strategy underwent Optuna optimization with 500 trials.

Strategy A (~55% PnL, ~500 trades, ~15% time)

Strategy A's parameters form a wide plateau. Take the key parameter htf_entry_sell:

  • Optimal value: 0.020
  • PnL at 0.015: +51% (7% drop)
  • PnL at 0.025: +49% (11% drop)
  • PnL at 0.010: +43% (22% drop)
  • PnL at 0.030: +41% (25% drop)

If you imagine this as a one-dimensional plot (X-axis — htf_entry_sell value, Y-axis — PnL), you'll see a gentle parabola with a flat top. The range 0.010-0.030 is the plateau, where PnL stays within +/-25% of the optimum.

Sensitivity ratio: S=0.110.25=0.44S = \frac{0.11}{0.25} = 0.44 — robust.

Plateau width at 10% threshold: from 0.013 to 0.027, Wrel=0.0140.04=35%W^{rel} = \frac{0.014}{0.04} = 35\%.

Strategy B (~25% PnL, ~40 trades, ~5% time)

Strategy B is optimized on a small number of trades. Parameter htf_entry_sell:

  • Optimal value: 0.018
  • PnL at 0.015: +24% (4% drop)
  • PnL at 0.025: +9% (64% drop)
  • PnL at 0.012: +11% (56% drop)

On the plot — an asymmetric and steep curve. The plateau exists only in the narrow range 0.015-0.020. To the right of the optimum — a cliff.

Sensitivity ratio: S=0.640.39=1.64S = \frac{0.64}{0.39} = 1.64 — moderate sensitivity, but with 40 trades this is a red flag. Small sample + narrow plateau = high probability of overfitting.

Plateau width at 10% threshold: from 0.016 to 0.020, Wrel=0.0040.04=10%W^{rel} = \frac{0.004}{0.04} = 10\%.

Strategy C (~300% PnL, ~400 trades, ~45% time)

Strategy C shows stunning PnL, but plateau analysis reveals problems:

  • Optimal value of htf_entry_sell: 0.022
  • PnL at 0.020: +295% (2% drop)
  • PnL at 0.025: +142% (53% drop)
  • PnL at 0.019: +128% (57% drop)

On the plot — a characteristic "needle": a very high peak at 0.022, sharp drop in all directions. The contour plot would show a bright spot immediately surrounded by cold colors.

Sensitivity ratio: S=0.530.14=3.79S = \frac{0.53}{0.14} = 3.79fragile. Despite 400 trades, the strategy is excessively dependent on the exact value of a single parameter.

Plateau width at 10% threshold: from 0.021 to 0.023, Wrel=0.0020.04=5%W^{rel} = \frac{0.002}{0.04} = 5\%.

Summary Table

Strategy PnL Trades Sensitivity Plateau width Robustness score Verdict
Strategy A +55% ~500 0.44 35% 0.148 Robust
Strategy B +25% ~40 1.64 10% 0.032 Check (small sample)
Strategy C +300% ~400 3.79 5% 0.008 Overfitting

Paradox: Strategy C with PnL +300% has the worst robustness score. Strategy A with a "modest" +55% is the most robust. This is a typical plateau analysis result: impressive numbers often mask fragility.

Confidence intervals for each strategy can additionally be verified through Monte Carlo bootstrap — it will show PnL spread when resampling trades.

3D Visualization and Heatmaps

3D parameter landscape surface with contour projection 3D surface plot of PnL over two parameters with contour lines projected onto the floor plane

For the most important parameter pairs, it's useful to build a 3D surface and heatmap. This provides intuitive understanding of the landscape shape.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D

def plot_parameter_landscape(
    study: "optuna.Study",
    param_x: str,
    param_y: str,
    grid_size: int = 50,
):
    """
    Build a 3D surface plot and heatmap for a pair of parameters.
    """
    trials = [t for t in study.trials
              if t.state == optuna.trial.TrialState.COMPLETE]

    x_vals = np.array([t.params[param_x] for t in trials])
    y_vals = np.array([t.params[param_y] for t in trials])
    z_vals = np.array([t.values[0] for t in trials])

    from scipy.interpolate import griddata

    xi = np.linspace(x_vals.min(), x_vals.max(), grid_size)
    yi = np.linspace(y_vals.min(), y_vals.max(), grid_size)
    Xi, Yi = np.meshgrid(xi, yi)
    Zi = griddata((x_vals, y_vals), z_vals, (Xi, Yi), method='cubic')

    fig = plt.figure(figsize=(18, 7))

    ax1 = fig.add_subplot(121, projection='3d')
    surf = ax1.plot_surface(Xi, Yi, Zi, cmap=cm.viridis, alpha=0.85,
                            edgecolor='none')
    ax1.set_xlabel(param_x)
    ax1.set_ylabel(param_y)
    ax1.set_zlabel('PnL, %')
    ax1.set_title('3D Parameter Landscape')
    fig.colorbar(surf, ax=ax1, shrink=0.5)

    ax2 = fig.add_subplot(122)
    hm = ax2.pcolormesh(Xi, Yi, Zi, cmap=cm.viridis, shading='auto')
    contours = ax2.contour(Xi, Yi, Zi, levels=10, colors='white',
                           linewidths=0.8, alpha=0.7)
    ax2.clabel(contours, inline=True, fontsize=8, fmt='%.0f%%')

    best = study.best_trial
    ax2.scatter(best.params[param_x], best.params[param_y],
                color='red', s=100, marker='*', zorder=5, label='Optimum')

    ax2.set_xlabel(param_x)
    ax2.set_ylabel(param_y)
    ax2.set_title('Contour Heatmap')
    ax2.legend()
    fig.colorbar(hm, ax=ax2)

    plt.tight_layout()
    plt.savefig(f'landscape_{param_x}_vs_{param_y}.png', dpi=150)
    plt.show()

A 3D surface plot for a robust strategy resembles a table mountain — a flat top with gentle slopes. For a fragile strategy — a sharp peak, like the Matterhorn. The heatmap complements the 3D view, showing the same information in a top-down projection with isolines.

Red Flags: When Optimization Results Are Suspicious

Red flags dashboard for optimization results Warning indicators that signal potential overfitting in optimization results

Eight signs that optimization found overfitting rather than a real pattern:

1. Sensitivity Ratio > 2 for a Key Parameter

If PnL drops more than 20% with a 10% parameter shift — the optimum is fragile.

2. Plateau Width < 10% of the Search Range

If the "good" region occupies less than 10% of the explored range — the optimizer most likely found an artifact.

3. Top-3 Trials Yield PnL 2-3x Above the Median

If the best trials are outliers against the rest rather than the "hilltop" — it's not a plateau.

top_3_mean = np.mean(sorted([t.values[0] for t in study.trials
                              if t.state == optuna.trial.TrialState.COMPLETE],
                             reverse=True)[:3])
median_pnl = np.median([t.values[0] for t in study.trials
                         if t.state == optuna.trial.TrialState.COMPLETE])

outlier_ratio = top_3_mean / median_pnl
if outlier_ratio > 2.5:
    print(f"WARNING: Top trials are {outlier_ratio:.1f}x above median — possible overfitting")

4. Low Trade Count (< 50) with High PnL

Small sample + high PnL = high variance in the estimate. Plateau analysis on 40 trades is unreliable in itself. For such strategies, Monte Carlo bootstrap is critical.

5. One "Magic" Parameter Combination

If the contour plot shows a single bright dot amidst a gray field — this isn't a strategy, it's a data-fitted combination.

6. Too Many Parameters

For 12 parameters with 10 values each, the search space contains 101210^{12} combinations. Optuna explores ~500. The probability of finding a "good" artifact in such a space is high. The more parameters, the stricter plateau analysis should be.

7. PnL Drops Sharply Out-of-Sample

If in-sample PnL is +87% and walk-forward shows +12% — the optimization fitted parameters to the training period. More about this in the Walk-Forward optimization article.

8. Parameters Are "Pinned" to Range Boundaries

If the optimal value coincides with the search grid boundary — the optimum may lie beyond the range. Expand the range and rerun the optimization.

Automated Plateau Analysis Report

Bringing it all together into a single report generated after each optimization:

import json
from datetime import datetime

def generate_plateau_report(
    study: "optuna.Study",
    strategy_name: str,
    n_trades: int,
    threshold_pct: float = 10.0,
) -> dict:
    """
    Generate a complete plateau analysis report.
    """
    robustness = compute_robustness_score(study, threshold_pct)

    red_flags = []

    sorted_params = sorted(
        robustness["parameters"].items(),
        key=lambda x: x[1]["importance"],
        reverse=True
    )
    for name, metrics in sorted_params[:3]:
        if metrics["sensitivity_ratio"] > 2.0:
            red_flags.append(
                f"High sensitivity for {name}: "
                f"S={metrics['sensitivity_ratio']:.2f}"
            )

    for name, metrics in robustness["parameters"].items():
        if metrics["plateau_width_rel"] < 0.05:
            red_flags.append(
                f"Narrow plateau for {name}: "
                f"W={metrics['plateau_width_rel']:.1%}"
            )

    all_values = sorted(
        [t.values[0] for t in study.trials
         if t.state == optuna.trial.TrialState.COMPLETE],
        reverse=True
    )
    if len(all_values) > 10:
        top3 = np.mean(all_values[:3])
        med = np.median(all_values)
        if med > 0 and top3 / med > 2.5:
            red_flags.append(
                f"Top trials are outliers: "
                f"{top3:.1f} vs median {med:.1f} "
                f"({top3/med:.1f}x)"
            )

    if n_trades < 50:
        red_flags.append(f"Low trade count: {n_trades}")

    report = {
        "strategy": strategy_name,
        "timestamp": datetime.now().isoformat(),
        "best_pnl": study.best_value,
        "n_trials": len(study.trials),
        "n_trades": n_trades,
        "robustness_score": robustness["robustness_score"],
        "verdict": robustness["verdict"],
        "red_flags": red_flags,
        "parameters": robustness["parameters"],
    }

    return report


report = generate_plateau_report(
    study, strategy_name="Strategy A", n_trades=491
)

print(json.dumps(report, indent=2, default=str))

Example output:

{
  "strategy": "Strategy A",
  "best_pnl": 55.2,
  "n_trials": 500,
  "n_trades": 491,
  "robustness_score": 0.1482,
  "verdict": "robust",
  "red_flags": [],
  "parameters": {
    "htf_entry_sell": {
      "importance": 0.312,
      "sensitivity_ratio": 0.44,
      "plateau_width_rel": 0.35
    }
  }
}

Relationship with Walk-Forward Validation

Walk-forward validation complementing plateau analysis Parametric robustness (plateau analysis) and temporal robustness (walk-forward) as two complementary validation systems

Plateau analysis and walk-forward validation (WFO) are complementary methods:

  • Plateau analysis answers the question: "How stable is the optimum to small parameter shifts?" This is a check of parametric robustness.
  • Walk-forward answers the question: "Do the parameters work on data the optimizer hasn't seen?" This is a check of temporal robustness.

A strategy can pass plateau analysis (wide plateau) but fail walk-forward (market regime changed). And vice versa — it can pass walk-forward on fixed parameters but have a fragile optimum.

Recommendation: always use both methods. If a strategy passes plateau analysis (R>0.1R > 0.1) and walk-forward (PnLOOS>50%×PnLIS\text{PnL}_{OOS} > 50\% \times \text{PnL}_{IS}) — this is a strong signal of robustness. More details in the Walk-Forward optimization article.

To assess PnL confidence intervals at each stage, apply Monte Carlo bootstrap. And for correctly comparing strategies with different active time, use the PnL per active time metric.

Recommendations

Before Optimization

  1. Limit the number of parameters. The fewer parameters — the more reliable the plateau. 5-7 parameters is a reasonable maximum. 12 already requires heightened caution.

  2. Set meaningful ranges. Don't set htf_entry_sell from 0.001 to 1.0 if the realistic range is 0.005 to 0.05. Unnecessarily wide ranges create the illusion of a plateau.

  3. Use enough trials. For 12 parameters, a minimum of 300-500 trials. For reliable plateau analysis — 1000+.

During Optimization

  1. Watch convergence. If Optuna continues finding significantly better solutions after 400 trials — the process hasn't converged, and plateau analysis will be unreliable.

  2. Use pruning with caution. Aggressive pruning (MedianPruner) can cut trials that look bad in early steps but are important for building a complete landscape picture.

After Optimization

  1. Generate the plateau report automatically. Integrate generate_plateau_report() into the optimization pipeline. Don't rely on visual assessment — use numbers.

  2. Check the top-5 parameters. If fANOVA shows that 3 parameters explain 80% of the variance — the remaining 9 can be checked less thoroughly.

  3. Compare with the baseline strategy. If the strategy with default parameters (no optimization) shows +30%, and the optimized one +55% — the difference is only 25 pp, and the plateau is likely wide. If the default shows 0%, and the optimized one +300% — all profitability depends on precise parameter fitting.

  4. Final check — walk-forward. Plateau analysis is a necessary but not sufficient condition for robustness. Always validate out-of-sample.

Conclusion

Parameter optimization is a powerful tool, but without plateau analysis it's a game of roulette. You don't know whether you've found a stable pattern or fitted the model to noise.

Three rules of plateau analysis:

  1. Compute the robustness score. The product of weighted plateau widths gives a single number that summarizes the robustness of all parameters. R>0.1R > 0.1 — green light.

  2. Sensitivity ratio < 1 for key parameters. If a 10% parameter shift causes less than 10% PnL drop — the parameter is robust. If more — be cautious.

  3. Visualize contour plots. No metric can replace understanding the landscape shape. A flat table mountain — good. A sharp needle — bad.

Plateau analysis takes 5 minutes after optimization and can save weeks of unprofitable live trading. It's a mandatory step between study.optimize() and launching the bot.


Useful Links

  1. Optuna Documentation — Visualization
  2. Hutter, F., Hoos, H., Leyton-Brown, K. — An Efficient Approach for Assessing Hyperparameter Importance (fANOVA, 2014)
  3. Pardo, R. — The Evaluation and Optimization of Trading Strategies
  4. Marcos Lopez de Prado — Advances in Financial Machine Learning, Chapter 11: Dangers of Backtesting
  5. Bailey, D.H. et al. — The Probability of Backtest Overfitting (2015)
  6. Optuna — optuna.visualization.plot_contour
  7. Optuna — optuna.importance.FanovaImportanceEvaluator
  8. Bergstra, J. & Bengio, Y. — Random Search for Hyper-Parameter Optimization (2012)

Citation

@article{soloviov2026plateauanalysis,
  author = {Soloviov, Eugen},
  title = {Plateau Analysis: How to Distinguish a Robust Optimum from Overfitting},
  year = {2026},
  url = {https://marketmaker.cc/en/blog/post/plateau-analysis-overfitting},
  version = {0.1.0},
  description = {Why finding the best strategy parameters is only half the work. How to visually and quantitatively distinguish a stable plateau from a fragile peak, and why Optuna contour plots are a mandatory step before launching an optimized strategy into production.}
}
Disclaimer: The information provided in this article is for educational and informational purposes only and does not constitute financial, investment, or trading advice. Trading cryptocurrencies involves significant risk of loss.

MarketMaker.cc Team

Quantitative Research & Strategy

Discuss in Telegram
Newsletter

Stay Ahead of the Market

Subscribe to our newsletter for exclusive AI trading insights, market analysis, and platform updates.

We respect your privacy. Unsubscribe at any time.