The Normal Distribution (or Gaussian)

©Trading Research - www.trading-research.com - december 2014 - Author: Roberto Ambrogi

I want to start with an article on the normal distribution (or Gaussian). As you may have noticed the normal distribution appears in a stylized form in our Trading Research logo. We made this choice because it represent a key element of some market representations tools that we use (eg. Volume Profile and Market Profile) and moreover because it is a reference tool for most of the statistical studies that we make.

Those who have a statistical background can definitely pass over and we apologize with them if the handling of the subject will be too brief and simplified, however I consider very important that every trader and every student of the markets should have a clear understanding of the concepts that define the normal distribution.
Distribuzione normale
The Gaussian is a very special frequency distribution (or probability function), by which we can approximate many practical daily life phenomena as well as widely been used in the research field (it is also called accidental errors curve); 
When we can approximate a phenomena with a normal distribution, it means to be able to easily draw out some practical conclusions.
In the observation of many real phenomena it appears very often that the extreme cases, the extreme values, are the rarest, and the central values around the mean value, are the most frequent.
Consider, for example, the distribution of heights in a bigger enough population sample, you will find few cases at extremes (very tall people and very small ones) and a growing number of people (frequencies) as we approach to the height values around the average height.
Before we go further and delve a little into the theory of the normal distribution it is better to first define position indices and dispersion indices:
a) POSITION INDEXES
ARITHMETIC MEAN OR AVERAGE (µ)
It's the sum of all the values of the variable in a population divided by the number of units of the population (N).
Commonly known simply as "the mean", it is a position index, a value which estimates the center of a set of numbers. In other words: it is the sum of all the values divided by the total number of values.
While the arithmetic mean is often used to report central tendencies, it is not a robust statistic, meaning that it is greatly influenced by outliers (values that are very much larger or smaller than most of the values)
This is the formula:
Formula Media
where N is the total number of values and xi (x1, x2, ..., xn) are the respective values in the dataset.

MEDIAN (Me)
The median is the value that occupies the central position in an ordered set of data. It 'a robust measure, as it is little affected by the presence of abnormal data. It therefore represents the value for which 50% of the data are lower and 50% are higher. The formulas for calculating the median when the number of observed data is odd is:
Formula mediana 1
when the number of observed data is eve, the formula is:

Formula mediana 2
where n represents the total number of data.
MODE (Mo)
The mode is the most frequent value of a distribution, or better, the modality which has the highest frequency.
For example considering this data sample:
962 1005 1003 768 980 965 1030 1005 975 989 955 783 1005
 
 
The mode of this sample is 1005 because it appears 3 times.

b) DISPERSION INDEXES

STANDARD DEVIATION
The standard deviation is the variance square root (see below). It is one of the dispersion indices, it is an indicative measure of how much the individual values may differ from the average.
The formula for the standard deviation of an entire population is:
DEVIAZIONE STANDARD
 where N is the population size and μ is the arithmetic mean in the population.

VARIANCE
The variance is a dispersion index that measures how the values in the data set may differ from the average. It is the arithmetic mean of the squared differences of the individual values from the mean. The squaring ensures that negative and positive differences do not cancel each other out.
The formula for the variance of an entire population is:
VARIANZA
where N is the population size and μ is the arithmetic mean in the population.

MEAN ABSOLUTE DEVIATION (MAD)
The mean absolute deviation is another index of dispersion, it is very similar to variance, it is still a measure of how the individual values of the set may differ from the average. The absolute value is used to prevent that deviations of opposite sign could cancel each other out. The fourmula is:
MAD
where n represents the number of observed values, x-strikeout, the average of the observed values, and xi the individual values.

RANGE
The range is a term commonly used in technical analysis, although it is a well-defined index in statistics; it is calculated by simply subtracting the minimum value from the maximum of the set of values into account.
The formula is simply:

Range = max(xi) - minimum(xi)

where xi is the set of values.

INTERQUARTILE RANGE IQR
The interquartile range (IQR) is the difference between the third quartile and the first quartile. This simple formula is used to calculate the interquartile range:
IQR
where xU is the third quartile and xL is the first quartile.
Quartiles are nothing more than a subdivision of the total set of values, into four equal groups.
The first quartile, or 25th percentile xL (also written as Q1), is the value for which 25% of the values in the data set are smaller than xL.
The second quartile or 50th percentile, xM (also written as Q2), coincides with the median. It represents the value for which 50% of the observed values are lower, and 50% are higher.
The third quartile or 75th percentile, xH (Q3) is the value such that 75% of the observed values is lower than xH.
The IQR is not particularly important for our purposes but it is good to know what it is. The IRQ in the normal distribution includes 50% of frequencies:
IQR distribuzione normale
Having briefly described all these indices let's come back to the Normal Distribution, these in short are its characteristics:
1) it is symmetric around the mean value (μ)
2) the mean, the median and the mode coincide; μ = Me = Mo
3) is asymptotic to the x axis in both sides (positive and negative) tending to zero
4) it has two inflection points: μ-σ and μ + σ
5) the area under the curve is = 1 as the probability of occurrence of a value in the range (-∞ to + ∞) = 1
Taking as reference the Gaussian distribution is possible to define other distributions with reference to it, in particular asymmetrical distributions positively or negatively (right or left) depending on the average with respect to fashion and median both right or left.
If we take the Gaussian as a reference, it is possible to define other distributions, in particular we can define positively or negatively asymmetrical distributions (to the right or to the left) depending on where the average is, with respect to mode and median.
 
 skewed distribution inglese
This can be quantified, by easily calculating the asymmetry using the SKEW funcion in excell.
When we observe a frequency distribution, and we see that mean and median, while not coincident are close to each other and particularly when the SKEW value is between -2 and +2, the distribution can be approximated by a normal distribution. This will also be visible in excel where the graphic of the frequencies will take the shape of the typical Gaussian bell curve.
The most used and interesting range of values, is the one that goes from μ-σ and μ + σ which as you can see from the chart contains 68% of the frequencies (here's where it comes out in the Market Profile concept of Value area); Then there's the other interval μ-2σ and μ+2σ that enclose even the 95.44% of frequencies that means almost all cases.
Identifying these ranges allows us to easily state where occurred the vast majority of a phenomenon frequencies and where it is reasonable, with other factors being equal, to find them in the future.
This has very important practical implications since we can easily define those values that can be considered the "normal" values for that phenomenon.
Someone will find this topic a kind of tough, may be confusing, too academic and far from the real trading, and it is true to some extent, but however in my opinion it represent a theoretical needed foundation for:
The topic that seemed to some will maybe confusing, too academic and far from the real trading, is in my opinion, however, a theoretical basis is essential for:
- understand the graphical representation of market profile and volume profile
- understand the Auction Market Theory and Market Profile concepts
- doing research on any given market parameter using historical data to define, in a given period of time, which is the "normal" value of that parameter such as "the normal volume" the "normal range" the "normal divergence" and so on.
We will cover in detail these more practical applications during our courses and in the next free articles on this website.
Good Trading