The probability distribution that will be used most of the time in this book is the so called f-distribution. The f-distribution is very similar in shape to the normal distribution but works better for small samples. In large samples the f-distribution converges to the normal distribution.
Properties of the t-distribution
In the previous section we explained how we could transform a normal random variable with an arbitrary mean and an arbitrary variance into a standard normal variable. That was under condition that we knew the values of the population parameters. Often it is not possible to know the population variance, and we have to rely on the sample value. The transformation formula would then have a distribution that is different from the normal in small samples. It would instead be f-distributed.
Assume that you have a sample of 60 observations and you found that the sample mean equals 5 and the sample variance equals 9. You would like to know if the population mean is different from 6. We state the following hypothesis:
H0 : u= 6 H1 : / * 6
We use the transformation formula to form the test function
1. The f-distribution is symmetric around its mean.
2. The mean equals zero just as for the standard normal distribution.
3. The variance equals k/(k-2), with k being the degrees of freedom.
Observe that the expression for the standard deviation contains an S. S represents the sample standard deviation. Since it is based on a sample it is a random variable, just as the mean. The test function therefore contains two random variables. That implies more variation, and therefore a distribution that deviates from the standard normal. It is possible to show that the distribution of this test function follows the ^-distribution with n-1 degrees of freedom, where n is the sample size. Hence in our case the test value equals
The test value has to be compared with a critical value. If we choose a significance level of 5% the critical values according to the ^-distribution would be [-2.0; 2.0]. Since the test value is located outside the interval we can say that we reject the null hypothesis in favor for the alternative hypothesis. That we have no information about the population mean is of no problem, because we assume that the population mean takes a value according to the null hypothesis. Hence, we assume that we know the true population mean. That is part of the test procedure.
The Chi-square distribution
Until now we have talked about the population mean and performed tests related to the mean. Often it is interesting to make inference about the population variance as well. For that purpose we are going to work with another distribution, the Chi-square distribution.
Statistical theory shows that the square root of a standard normal variable is distributed according to the Chi-square distribution and it is denoted x2, and has one degree of freedom. It turns out that the sum of squared independent standard normal variables also is Chi-squared distributed. We have:
Properties of the Chi-squared distribution
1. The Chi-square distribution takes only positive values
2. It is skewed to the right in small samples, and converges to the normal distribution as the degrees of freedom goes to infinity
3. The mean value equals k and the variance equals 2k, where k is the degrees of freedom
In order to perform a test related to the variance of a population using the sample variance we need a test function with a known distribution that incorporates those components. In this case we may rely on statistical theory that shows that the following function would work:
where S2 represents the sample variance, a1 the population variance, and n-1 the degrees of freedom used to calculate the sample variance. How could this function be used to perform a test related to the population variance?
We have a sample taken from a population where the population variance a given year was — = 400. Some years later we suspect that the population variance has increased and would like test if that is the case. We collect a sample of 25 observations and state the following hypothesis:
Using the 25 observations we found a sample variance equal to 600. Using this information we set up the test function and calculate the test value:
We choose a significance level of 5% and find a critical value in Table A3 equal to 36.415. Since the test value is lower than the critical value we cannot reject the null hypothesis. Hence we cannot say that the population variance has changed.
The final distribution to be discussed in this chapter is the .F-distribution. In shape it is very similar to the Chi-square distribution, but is a construction based on a ratio of two independent Chi-squared distributed random variables. An F-distributed random variable therefore has two sets of degrees of freedom, since each variable in this ratio has its own degrees of freedom. That is:
Properties of the F-distribution
1. The F-distribution is skewed to the right and takes only positive values
2. The F-distribution converges to the normal distribution when the degrees of freedom become large
3. The square of a f-distributed random variable with k degrees of freedom become F-distributed: tk = F] £
The P-distribution can be used to test population variances. It is especially interesting when we would like to know if the variances from two different populations differ from each other. Statistical theory says that the ratio of two sample variances forms an P-distributed random variable with n1 -1 and n2 -1 degrees of freedom:
Assume that we have two independent populations and we would like to know if their variances are different from each other. We therefore take two samples, one from each population, and form the following hypothesis:
Using the two samples we calculate the sample variances, Sj2 = 8.38 and S% = 13.14 with nl = 26 and "2 = 30 . Under the null hypothesis we know that the ratio of the two sample variances is P-distributed with 25 and 29 degrees of freedom. Hence we form the test function and calculate the test value:
This test value has to be compared with a critical value. Assume that we choose a significance level of 5%. Using Table A4 in the appendix, we have to find a critical value for a two sided test. Since the area outside the interval should sum up to 5%, we must find the upper critical point that corresponds to 2.5%. If we look for that value in the table we find 2.154. We call this upper point F0 025. In order to find the lover point we can use the following formula:
We have therefore received the following interval: [0.464;2.154]. The test value lies within this interval, which means that we are unable to reject the null hypothesis. It is therefore quite possible that the two population variances are the same.