Statistical analysis has been used in financial markets for many decades to help take the guesswork out of – or, at least, firm up the "gut feelings" of – some of the top names in investment.
The history of statistics goes back centuries, even millennia – think of biblical censuses and the Domesday Book – the gathering of data for the study of population demographics.
The use of charts and historical data is commonplace in private investment but the use of statistical analysis is more typically associated with quantitative investment and active fund management techniques.
The so-called "Oracle of Omaha" Warren Buffett is perhaps best known for his "get greedy when everyone else is scared, and get scared when everyone else is being greedy" line.
But he was a pioneer of analysis – starting out with tips sheets on the racetrack.
He graduated to stock trading, and part of his approach is to work out what price is right for him, compared with what profits that company could be expected to be earning in 10 years.
And there are methods of extrapolating this from historical data using some, or a combination of the techniques listed below.
Buffett's ex-daughter in law, Mary Buffett, wrote in her book on his trading style: "Warren has found, if the company is one of sufficient earning power and earns high rates of return on shareholders' equity, created by some kind of consumer monopoly, chances are good that accurate long-term projections of earnings can be made."
Limitations to data analysis techniques
Statistical analysis has its limitations. There's little room to account for "black swan" events – those sometimes catastrophic occurrences that no amount of number crunching can predict.
And during such periods – as in the volatile market conditions that followed events such as the dotcom bubble, 9/11 and the 2008 financial crisis – statistical analysis becomes something of a blunt tool, its predictive power neutered by unpredictability.
Statistical analysis tools
Below, then, are several techniques in the arsenal of statistical analysis.
We've removed most of the hard sums to leave you with just the ideas that make these tools useful.
Should these ideas prompt the desire for a deeper understanding of these types of investment methods, we've included further reading list at the end of the article
Measures of central value
There are three measures of central tendency in statistical analysis: the mean, median and mode. All three are summary measures that attempt to best describe a whole set of data in a single value that represents the core of that data set's distribution.
This is the most commonly occurring value in a data set.
Consider the following data set of the ages of 10 children:
4, 5, 5, 6, 6, 6, 7, 8, 8 and 9
The mode here is 6, as this is the most commonly occurring value. The mode, however, won't necessarily reflect the central value of a data set. Also, it is possible for there to be two or more modes in a data set or, indeed, no mode at all.
2. Arithmetic mean
The mean is the average value of a data set.
Consider the following data set:
2, 4, 5, 8 and 9
The arithmetic mean is arrived at by adding all the numbers together and then dividing the total by the number of data points in the set.
So, by adding 2+4+5+8+9 = 28, which we then divide by 5 (the number of data points, or numbers, in that set) we arrive at 5.6.
Mean values are useful in many circumstances in business.
Internet shopping sites always ask for your age range when you set up an account. This is not only useful to them, but also to other retailers and manufacturers of goods for targeting advertising to certain age groups.
In investment, particularly for institutions, it's becoming increasingly important to know the average buying prices at certain times of day to know whether your institution is arriving at best execution on its asset purchases.
The media is the middle number in a data set.
Consider the same data set as above:
2, 4, 5, 8 and 9
The median is simply the number in the middle = 5. This is easily arrived at if the data set is an odd number, as above. But what if the data set were:
1, 2, 4, 5, 8 and 9
In the case of data set with an even number of data points, we take the average of the middle two numbers.
So, 4+5/2 gives us a median of 4.5.
Median values are useful in statistical analysis because they are less prone to be skewed by anomalies or other unusual appearances in a data set. Consider the following set:
2, 4, 5, 8 and 798
In reality, such an extraordinary thing isn't likely to happen in such a small set, but the median of 5 is much more representative of the majority of that data set than the arithmetic mean of 163.4.
Imagine the example of salaries in a company. Let's say there are three broad ranges of salary: 80% of those salaries are for semi-skilled and unskilled workers, while 15% are for skilled workers and supervisors, while just 5% is represented by senior managers and executives.
That top 5% skews the average salary upward.
A semi-skilled worker earning £30,000 a year isn't likely to be impressed to learn that the mean salary where he works is £45,000 a year. He knows he earns more than an unskilled worker, but the mean salary makes his route up the corporate ladder seem a terribly long one.
His salary is likely to be more closely related to the median given the percentage of workers in that group of the data set.
4. Mathematical expectation
This is also called the expected value (EV), is the number in probability theory one may arrive at when a task with random variable outcomes is performed many times – such as rolling a single dice.
The data set here is 1, 2, 3, 4, 5 and 6 and probability of any of those numbers turning up on a single throw is 1 in 6, or 1/6 or, expressed as a decimal, 0.16666.
The mathematical expectation or EV is arrived at by multiplying each of the possible outcomes by the probability of it occurring and adding the sums of all those values. Hence, with a dice roll:
1x0.166666+2x0.16666 . . . +6x0.16666 = 3.5
Simply, the expected value is the arithmetic mean of all possible outcomes, so:
(1+2+3+4+5+6)/6 = 3.5
The law of large numbers dictates that the more often the dice is thrown, the nearer the mathematical mean value of those throws approaches EV. This is called convergence.