Validation2026-06-10

Why your live results don't match your backtest

TL;DR

Live almost never matches the backtest exactly, and a small gap is normal. A large one has a cause: an overfit or cost free backtest, execution frictions like slippage and spread, a market regime that shifted, or a thin live sample. Measure the gap instead of guessing. Quantprove's Validation scores how closely your live trades still track the backtest and reports it as a Stability Score.

Why don't your live results match your backtest?

Some gap is normal, because your live trades are a fresh, small sample and the backtest was the best version of the past you could build. A large gap is not, and it points to one of a few causes. The backtest may have been overfit, tuned to the past in detail that does not repeat. It may have left out costs that hit every live trade. Execution may differ, with slippage and spread eating fills the backtest assumed were clean. Or the market may have moved into a regime the strategy was never built for.

The useful move is to name which one is yours. Each cause leaves a different fingerprint in the live results, and each has a different fix.

A small gap is sample noise. A large gap has a cause you can name.

How much of the gap is just variance?

Part of every gap is luck. Your live record might be 40 trades against a backtest of 2,000, and 40 trades swing widely on their own. A strategy with a real edge still has stretches where live runs below the backtest average, the same way it has winning and losing months. Before you tear the strategy apart, check whether the gap sits inside the range a small sample can produce.

Sample size decides how much to trust the comparison. Quantprove's Validation has no confidence floor: a comparison built on very few live trades is discounted hard, because there is not enough live evidence yet to judge. How many trades you need to validate a strategy covers the math. In simpler words... 40 trades cannot convict anything. If the gap is small and the sample is thin, collect more trades before drawing a conclusion.

Which backtest mistakes show up as a live shortfall?

Overfitting is the usual culprit. A curve fit strategy looks excellent on the data it was tuned to and average or worse on everything else, so the edge that filled the backtest is missing live. Leaving costs out does the same in a quieter way: the backtest banked gross wins your account never sees once commissions, spread, and slippage come out. Lookahead bias, using data a trade could not have known yet, inflates the backtest and cannot be reproduced live.

These are the same errors that inflate an Edge Score in the first place. The mistakes that inflate a backtest covers how to catch each one, and overfitting is the single biggest reason a backtest edge does not survive contact with live trading.

How do execution and costs widen the gap?

A backtest fills at prices the market may not have given you. Slippage moves your entry and exit away from the signal price, spread takes a slice on every round turn, and commissions come out win or lose. On a strategy with a small average win, these frictions alone can turn a positive backtest into a flat live record, with the shape of the curve intact but the size shrunk.

Latency and partial fills widen it further for faster systems. The check is to rebuild the backtest with realistic costs and fills, then compare again. How to build trading costs and slippage into your backtest walks the full rebuild. If the live shortfall matches the cost you stripped out, the edge is real and the backtest was simply gross.

The frictions are small per trade and relentless per hundred.

Could a regime change explain it?

Markets move through regimes, and a strategy built for one can fade when the market shifts into another. A trend system stalls in a range. A mean reversion system bleeds in a strong trend. If your backtest covered one regime and live trading landed in a different one, the edge can be real and still absent right now, because the conditions it needs are not present.

A regime gap looks different from overfitting: overfitting shows up weak from the first live trade, while a regime gap usually worked first and faded as conditions turned. Watching a rolling score over the live record separates the two, since decay shows up as a trend rather than a flat miss. How to know when your trading strategy stops working covers that read in full.

How do you measure the gap instead of guessing?

You measure it by scoring how closely your live trades still track the backtest. Quantprove's Validation does this directly: it compares the live record against the backtest across the return distribution, the drawdown profile, edge quality, and return composition, then reports a Stability Score. A score of 60 or above means live is behaving like the backtest. A large drop flags that the two have come apart.

The score turns a vague worry into a number with a cause attached. Instead of asking whether the gap feels too big, you read which part of the comparison broke: distribution, drawdown, edge quality, or return. That points you at the fix, whether it is costs, overfitting, or a regime your strategy is waiting out.

Where is your backtest to live gap coming from?

Each cause leaves a different mark in the live results. Match the symptom to the likely cause, then run the check.

What you see live	Likely cause	How to check
Same shape, smaller wins	Costs and slippage not in the backtest	Rebuild with net trades, compare again
Edge gone from the first trade	Overfitting or lookahead bias	Out of sample test, read the Stability Score
Worked, then faded	Regime change or edge decay	Watch a rolling score across the live record
Deeper drawdowns than expected	Underestimated tails or variance	Monte Carlo, check the drawdown profile
Slightly worse, within range	Normal sample variance	Collect more live trades, recheck Stability

A gap that matches one of the top rows has a fix. A gap in the bottom row is the cost of a small sample, and it shrinks as the live record grows. Either way, run your trades through Validation and read where the two come apart.

Your live results will rarely match the backtest to the decimal, and they do not need to. What matters is whether the gap is sample noise or a real break. A backtest built on net trades, free of lookahead, and tested out of sample starts the live record close to plan. A real edge holds up out of sample; a curve fit one comes apart the moment live trading begins.

Frequently asked questions

Some difference is normal, because live is a fresh, small sample and a backtest is the best fit of the past. A few points of slippage and a stretch of below average trades are expected. A collapse, where the edge disappears, is not. Quantprove flags a live record that still tracks the backtest with a Stability Score of 60 or above.

Yes. Costs, slippage, and ordinary variance can pull live below the backtest without the edge being broken. The tell is that the shape holds and the gap stays within what a small sample and real costs explain. A good strategy underperforms by a margin; a broken one loses the edge entirely.

No. Overfitting is one cause, and the most common, but costs and slippage, lookahead bias, execution differences, a regime change, and plain sample variance all widen the gap. Each leaves a different fingerprint, so the fix depends on which one you have.

Enough that a few trades cannot swing it. Quantprove's Validation has no confidence floor, so a comparison on very few live trades is discounted hard. A few dozen trades hint at the gap; a few hundred make it reliable. Collect more before drawing a firm conclusion.

Yes. Validation compares your live trades against the backtest across distribution, drawdown, edge quality, and return, then reports a Stability Score from 0 to 100. It shows not just how big the gap is but which part of the comparison came apart.

References

See how many trades your strategy has earned.

Upload your trade log and read your Edge Score with its sample-size adjustment. Free to start.

Start for free How it works

No credit card required·Swiss Made