
Walk-forward optimisation: testing strategy robustness beyond the in-sample window
Standard backtesting optimises strategy parameters over the full historical dataset and reports how well the optimised strategy performed over that same dataset. The problem is self-referential: the parameters were chosen precisely because they worked on that data. Walk-forward optimisation breaks this circularity by separating training data from test data, in a way that respects the time ordering of the data. It is the most widely used method for approximating out-of-sample performance before deploying a strategy live.
What walk-forward optimisation does
The method defines a training window (typically 1–5 years) and an out-of-sample test window (typically 3–12 months). The strategy is optimised over the training window, then applied without further changes to the test window. The test window's performance is recorded. The windows then advance by one test period, and the process repeats. At the end of the full historical dataset, the test-period returns are concatenated to form a synthetic out-of-sample equity curve. This curve is a closer approximation of live performance than any in-sample backtest because the parameters used in each test period were determined before those periods were observed.
Expanding window vs rolling window
There are two variants. In the rolling window approach, the training window stays fixed in length and moves forward with each iteration—earlier data is dropped. This is appropriate when the investor believes that recent data is more predictive than older data, or when structural breaks make older data less relevant. In the expanding window approach, the training window starts at the beginning of the dataset and grows with each iteration—all historical data is always included. This is appropriate when there is no reason to discount older data and when sample size is a limiting factor in reliable parameter estimation. Most systematic strategy developers use rolling windows.
Interpreting walk-forward results
A strategy that shows strong in-sample performance but weak walk-forward performance is likely overfit to the training data—the parameters capture noise rather than signal. A strategy where walk-forward performance is close to (or even exceeds) in-sample performance has demonstrated genuine robustness. The ratio of out-of-sample Sharpe to in-sample Sharpe is sometimes used as an overfitting diagnostic: a ratio above 0.5 is considered acceptable; below 0.3 is a warning sign. Walk-forward testing does not guarantee future performance, but it substantially narrows the gap between what a strategy looked like in development and what it may look like in deployment.
Limitations
Walk-forward optimisation still suffers from the basic limitation of all historical testing: it assumes that historical market behaviour is representative of future behaviour. If the regime changes structurally—as it did during the zero-interest-rate period or during the COVID shock—walk-forward results from prior data may be misleading. The method is also computationally intensive: for a strategy with multiple parameters, each window requires a full grid search or numerical optimisation. And like all forms of testing, it is susceptible to multiple comparisons bias if the investor tests many strategy variants and reports only the best walk-forward result.
Walk-forward optimisation in pfolio
pfolio uses walk-forward testing in its internal strategy development process to validate the robustness of the systematic allocation signals before deploying them in the portfolio construction engine. The platform's published backtesting methodology documentation describes the training and test window parameters used in each strategy category.
Related articles
Disclaimer
Get started now

