The central limit theorem is one of the least understood and most powerful results in statistics. It explains, at some deep level, the connection between dice games and the shapes of Fisher’s iris petals. The central limit theorem can also help you devise statistical tests (or interpret data) where the underlying observations are drawn from unknown or badly behaved distributions.

The key to understanding the central limit theorem is that it is about the distribution of means (averages), not the observations themselves. The amazing result is that the distribution of the means is known to be Gaussian, regardless of the distribution of underlying data, as long as they have a finite mean and variance. A little more formally,

• The observations (or sample) are Xp X2,..., XN.

Zi =N

Xi, is just the ordinary average, which will give a


different answer each time N datapoints are sampled from the pool.

• If N is large enough, the distribution of A(X) will be a Gaussian, N(^, o2), where ц = E[X], the true expectation of X, and variance o2 = V"[X]/N, the true variance of X divided by the sample size.

This means that for samples of badly behaved (non-Gaussian) data, like the CD4 data shown in the example, the distribution of averages will still be approximately Gaussian.

< Prev   CONTENTS   Source   Next >