The pros and cons of hypothesis testing and backtests
By Neil Dennis
11:23, 15 June 2017
Hypothesis testing is an instrument in the financial market trader's toolbox to help guide investment strategy by statistical means.
The use of charts and historical data is commonplace, but the use of statistical mathematics is rare among private investors.
What is hypothesis testing?
Hypothesis testing is a statistical test (sometimes called a backtest) that uses data gathered from a small sample group to make assumptions about a much larger population – sometimes an entire country.
The test starts with an observation – the "null hypothesis". Results gathered during and after testing will continue to support the null hypothesis until there is sufficient data to support an "alternative hypothesis".
Example:
An automated trading strategy generates a number of profitable trades with returns greater than 10%.
The trader establishes the null hypothesis, that over the long run similar results cannot be regularly repeated.
For the test, the same conditions are simulated using data that recreates historical prices, trading conditions – such as volatility and mean returns. The same trading strategy is then applied to this simulation and repeated 10,000 times.
If the trader finds that only 100 of the 10,000 results (1%) produces returns equal to or greater than the original trading strategy, the null hypothesis must be supported with a 99% probability.
If 5,000 of the results produced returns equal to or greater than the original strategy, the trader can accept the alternative hypothesis with a probability of 50%.
Who uses hypothesis testing?
This methodology is used in the finance industry, mainly by quantitative (quant) investment professionals.
Quant traders use many different mathematical models and data analysis to identify trading opportunities – of which hypothesis testing is just one tool.
Trading techniques used by quant traders include high-frequency trading and algorithmic trading.
Ernest Chan, a quantitative trader and expert in statistical models, says: "Hypothesis testing is useful to the extent that if we cannot reject a null hypothesis, we should abandon the strategy. However, just being able to reject a null hypothesis in no way guarantees that the strategy is sound, and will be profitable in live trading."
Hypothesis testing and markets
The technique tells us little about the markets. It cannot measure market sentiment, nor can it predict unusual reactions to economic data or corporate results, so its usefulness to private traders (unless you are investing in a quant fund) is limited.
Chan says: "Rather, it is using what we know about market statistics to determine whether our trading strategy has any statistical significance. That is, whether our backtest was based on random noise."
Therefore, trading strategies that use hypothesis testing are as vulnerable to market moving events as any other – but no more so.
Critics
The main criticism is expressed by Jeff Gill in his paper The Insignificance of Null Hypothesis Significance Testing using the following analogy:
- If a person is American, it is highly unlikely she is a member of Congress
- The person is a member of Congress
- Therefore is highly unlikely she is an American
Although this example comes from a study of the use of hypothesis testing in social sciences, it can also be used to illustrate a similar absurdity when applied to finance:
- If a trading strategy is bad, then it is highly unlikely it will be profitable in a backtest
- The trading strategy's backtest is profitable
- Therefore, it is highly unlikely the trading strategy is bad
Chan says: "You can see the absurdity of this 'deduction'. Just because a bad trading strategy typically produces an unprofitable backtest, it in no way guarantees that a trading strategy isn't bad when it generates a profitable backtest. It doesn't even suggest some kind of probability of soundness of the strategy."
In their book The Cult of Statistical Significance, Stephen Ziliak and Deirdra McCloskey argue that statistical significance is not the same as scientific significance and that too many mistakes are made in the misplaced emphasis on significance tests.
Conclusions
In my introduction to this article I referred to the “trader’s toolbox”, and this is a useful theme to return to here.
Many different tradesmen and women have toolboxes, but not one of them would ever think of only using one single tool for a job that requires many.
Hypothesis testing is only one of many tools and is only used – even by professional quant traders – in conjunction with all the other tools at their disposal.
Don’t fall prey to the notion that if all you have is a hammer, then everything starts to look like a nail.
Additional reporting by Ernest Chan