12 data analysis techniques for CFD traders

Statistical analysis has been used in financial markets for many decades to help manage uncertainty – or, at least, to support the intuitive decisions of some of the top names in investment.

By Dan Mitchell11:27 (UTC), 14 November 2025

The history of statistics goes back centuries, even millennia – think of biblical censuses or the Domesday Book, early examples of data gathered to understand population trends.

The use of charts and historical data is commonplace in private investment, while statistical analysis is more often linked with quantitative trading and active fund management.

Warren Buffett

The so-called ‘Oracle of Omaha’, Warren Buffett, is perhaps best known for his phrase, ‘be fearful when others are greedy, and greedy when others are fearful’.

But he was also a pioneer in the use of analysis, initially creating tip sheets for horse racing before turning to the stock market.

He later focused on investing, and part of his approach involved determining a fair price for a company based on its expected profits over the next decade.

There are several ways to estimate this using historical data, including some of the techniques outlined below.

Buffett’s former daughter-in-law, Mary Buffett, wrote in her book on his trading style: 'Warren has found, if the company is one of sufficient earning power and earns high rates of return on shareholders' equity, created by some kind of consumer monopoly, chances are good that accurate long-term projections of earnings can be made.'

Limitations to data analysis techniques

Statistical analysis has its limitations, leaving little scope to account for ‘black swan’ events – rare and sometimes severe occurrences that no amount of data modelling can reliably predict.

During such periods – including the volatile market conditions that followed the dotcom bubble, 9/11, and the 2008 financial crisis – statistical analysis can lose effectiveness, with its predictive power reduced by heightened uncertainty.

Statistical analysis tools

Below are several commonly used techniques in statistical analysis. We’ve simplified the concepts to highlight the key ideas that make these tools useful. If you’d like to explore these techniques further, you’ll find a list of suggested reading at the end of the article.

Measures of central value

There are three measures of central tendency in statistical analysis: the mean, median, and mode. All three summarise a data set with a single value representing the central point of that data set’s distribution.

1. Mode

The mode is the most frequently occurring value in a data set.

Consider the following set of ages for ten children:

4, 5, 5, 6, 6, 6, 7, 8, 8, 9

The mode here is 6, as it appears most often. However, the mode doesn’t always represent the central value of a data set, and there may be more than one mode, or, in some cases, no mode at all.

2. Arithmetic mean

The mean is the average value of a data set.

Consider the following numbers:

2, 4, 5, 8, 9

The arithmetic mean is calculated by adding all the values and dividing the total by the number of data points.

(2 + 4 + 5 + 8 + 9 = 28) ÷ 5 = 5.6.

Mean values are useful in many contexts.

For instance, online retailers often collect age-range data from users to better understand audience demographics and tailor marketing efforts accordingly.

In financial analysis, particularly within institutions, knowing the average buying price at specific times of day can help assess whether transactions are being executed efficiently.

3. Median

The median is the middle value in a data set.

Consider the same numbers:

2, 4, 5, 8, 9

The median is the middle number, which is 5. This is straightforward when the data set contains an odd number of values.

If the data set has an even number of values, for example:

1, 2, 4, 5, 8, 9

The median is found by taking the average of the two central numbers:

(4 + 5) ÷ 2 = 4.5.

Median values are particularly useful because they are less affected by extreme values or outliers.

For example:

2, 4, 5, 8, 798

Here, the median of 5 is far more representative of the data set than the arithmetic mean of 163.4.

In the context of salaries within a company, median values can also provide a more accurate picture.

Suppose 80% of employees are semi-skilled or unskilled, 15% are skilled workers or supervisors, and 5% are senior managers or executives. That top 5% can significantly skew the mean salary upwards.

A semi-skilled worker earning £30,000 may not find the mean salary of £45,000 particularly relevant, as it’s influenced by higher executive pay. In this case, the median salary provides a fairer indication of the typical earnings within their group.

Probability theory

4. Mathematical expectation

Also known as the expected value (EV), this concept estimates the average outcome of an event with random results when repeated many times – for example, rolling a single die.

The data set here is 1, 2, 3, 4, 5, 6, and the probability of any number appearing on one roll is 1 in 6, or 1/6 (0.1666).

The expected value is calculated by multiplying each possible outcome by its probability and then adding the results together:

(1 × 0.1666) + (2 × 0.1666) + … + (6 × 0.1666) = 3.5.

In simple terms, the expected value is the arithmetic mean of all possible outcomes:

(1 + 2 + 3 + 4 + 5 + 6) ÷ 6 = 3.5.

According to the law of large numbers, the more times the die is rolled, the closer the average outcome will move towards the expected value – a process known as convergence.

In business and financial analysis, expected value is often applied in risk assessment and scenario modelling to evaluate whether a potential outcome aligns with an organisation’s risk tolerance.

Modern computational tools now allow expected values to be calculated from data sets that were once too large to process efficiently. This can be particularly useful when estimating potential returns or modelling risk scenarios, especially when used alongside measures such as variance and standard deviation (see below).

Distribution models

5. Normal distribution

Normal distribution, also known as the Gaussian distribution, describes how values in a data set are spread across a range.

It can be illustrated along a single horizontal axis representing the full range of values within the data set. Half of the values fall above the mean, and half below it. Most data points cluster close to the mean, while the remainder gradually taper off on either side, forming the familiar bell-shaped curve.

Patterns of normal distribution in historical returns indicate that an asset’s performance has been relatively stable over time, with most outcomes close to the average.

6. Skewness

Skewness measures the degree of symmetry, or asymmetry, within a distribution.

In a standard normal distribution, skewness equals zero.

Negative skewness shifts the curve to the left, while positive skewness shifts it to the right.

When examining historical returns, analysts may note that a distribution showing positive skewness indicates a greater proportion of above-average returns.

However, interpreting skewness alone can be misleading. During market bubbles, for instance, positive skewness may reflect inflated prices rather than genuine performance. When the skew later turns negative, it may result in lower valuations.

Statistical analysis provides context, but its usefulness depends on how effectively it’s applied.

7. Kurtosis

Kurtosis measures how a distribution differs from a normal one by focusing on the frequency of extreme values. This introduces the concept of ‘tail risk’ – the likelihood of outcomes that deviate significantly from the mean.

A distribution described as having ‘fat tails’ shows higher kurtosis, meaning extreme outcomes occur more often than expected under a normal distribution.

Tail risk reflects the possibility of results more than three standard deviations from the mean, and is therefore a key consideration when assessing volatility.

Divergence from the mean

8. Variance

Variance is a statistical measure used to examine how each value in a data set differs from its mean.

Consider the data set 2, 4, 5, 8, 9.

The arithmetic mean is 5.6.

Subtract the mean from each number: −3.6, −1.6, −0.6, 2.4, 3.4. The sum of these deviations equals zero.

To calculate variance, each deviation is squared and averaged:

12.96, 2.56, 0.36, 5.76, 11.56.

The mean of these squared values is 6.64, representing the variance.

Variance is used in risk and performance analysis to understand the degree of variation within a set of returns, often alongside standard deviation.

9. Standard deviation

Standard deviation is the square root of the variance and one of the most widely used measures in statistical analysis.

Continuing the example above, a variance of 6.64 produces a standard deviation of 2.577.

In financial analysis, standard deviation is used to describe historical volatility – how much returns have fluctuated over time.

If most returns fall within one standard deviation of the mean, the data set is relatively stable.

If several returns fall outside this range, it shows greater variability.

Measures of similitude

9. Covariance

Covariance measures how two or more variables move relative to each other.

When two or more assets move in the same direction, they are said to have positive covariance.

Positive covariance can increase portfolio risk, as multiple assets may rise or fall together. Conversely, negative covariance can enhance diversification by balancing performance between assets.

10. Correlation coefficient

Simple correlations can often be observed visually by comparing two charts side by side, where similarities between peaks and troughs are easy to identify.

For precision, the correlation coefficient is calculated by dividing the covariance of two variables by the product of their standard deviations.

The result ranges from +1 to −1.

A positive value indicates a direct relationship; the closer to +1, the stronger the correlation.
A negative value shows an inverse relationship, meaning one variable tends to move in the opposite direction to the other.

Correlation analysis is widely used to compare an asset’s movement against a benchmark index.

11. Regression

R-squared (R²) measures the strength of the relationship between a fund, asset, or security and its benchmark index.

For instance, an equity fund focused on a specific sector would typically move closely in line with that sector’s sub-index within a broader stock market index.

R-squared values are expressed as percentages.

An R-squared of 100% indicates that a fund’s performance is entirely explained by its benchmark, while lower values suggest other factors also influence returns.

As a general guide, an R-squared value below 70% indicates a weaker relationship between an asset and its benchmark.

Conclusions and further reading

Statistical analysis is most effective when considered alongside a broader understanding of market conditions. Without this context, its value as a standalone tool is limited.

While relying solely on intuition offers little clarity, combining statistical analysis with financial data – such as balance sheets, profit and loss statements, and historical returns – can help provide a clearer framework for interpreting market behaviour.

It’s always advisable to review reliable and up-to-date information before making any trading decisions.

This article, along with the suggested reading below, can support further learning and a deeper understanding of how statistical principles apply to financial analysis.

You may also find it useful to explore our educational resources and trading guides, which cover related topics in more depth.

FAQ

What is statistical analysis in trading?

Statistical analysis involves examining historical data to identify patterns, relationships, and probabilities in market behaviour. In CFD trading, it can help traders understand how assets have behaved over time and how factors such as volatility or correlation may influence price movements. It’s a tool for interpretation rather than prediction, and is most effective when used alongside broader market and economic analysis.

What is the difference between variance and standard deviation?

Variance measures how far individual data points deviate from the average (mean) of a data set. Standard deviation, the square root of variance, provides a clearer indication of how widely values – such as returns – differ from the mean. In trading analysis, standard deviation is often used to assess historical volatility, or the degree to which asset prices fluctuate over time.

What does a normal distribution tell traders?

A normal distribution describes how values in a data set – such as returns – are spread around an average. In a perfectly normal distribution, most results cluster near the mean, with fewer extreme highs and lows. While useful for modelling historical data, it’s important to note that financial markets often experience irregular or unpredictable events that fall outside this pattern.

How is correlation used in portfolio analysis?

Correlation measures how closely two or more assets move in relation to each other. A positive correlation means assets tend to move in the same direction, while a negative correlation means they move in opposite directions.

What are the limitations of statistical analysis in trading?

Statistical analysis cannot fully account for rare or extreme events, often referred to as ‘black swan’ events. It also relies on historical data, which may not always represent future market conditions. For that reason, statistical models are best viewed as supportive analytical tools rather than predictive ones, and are most effective when combined with broader financial and economic insight.