The adjusted coefficient of determination (Adjusted R2)
The coefficient of determination can be used to describe the linear strength between the dependent and independent variables in a regression model. But its size is dependent on the degrees of freedom. Therefore it is not meaningful to compare R2 between two different models with different degrees of freedom. A solution to this problem is to control for the degrees of freedom and adjust the coefficient of determination accordingly. That could be done in the following way:
where R 2 denotes the adjusted coefficient of determination. The adjusted coefficient of determination can also be expressed as a function of the unadjusted coefficient of determination in the following way:
It turns out that the adjusted R2 is always lower than the unadjusted R2, and when the number of observations increases they converge to each other. Using equation (5.7) we see that the adjusted coefficient of determination is a function of the variance of Y as well as the variance of the residual. Rearranging (5.7) we receive that
As can be seen, the larger the adjusted R2, the smaller become the residual variance. Another interesting feature of the adjusted R2 is that it can be negative, an event impossible for the unadjusted R2. That is especially likely to happen when the number of observations are low, and the unadjusted R2 is small, let's say around 0.06.
Another important point to understand is that the coefficient of determination only can be compared between models when the dependent variable is the same. If we have Y in one model and ln( Y) in another, the dependent variable is transformed and should not be treated as the same. It is for that reason not meaningful to compare the adjusted or unadjusted R2 between these two models.
The analysis of variance table (ANOVA)
Almost all econometric software generates an ANOVA table together with the regression results. An ANOVA table includes and summarizes the sum of squares calculated above:
Table 5.1 The ANOVA Table
The decomposition of the sample variation in Y can be used as an alternative approach of performing test within the regression model. We will look at two examples that work for the simple regression model. In the multiple regression case we have an even more important use, which will be described in chapter 6, and is related to simultaneous test on subsets of parameters.
Assume that we are working with the following model:
Using a random sample we calculated the components of the ANOVA table and want to perform a test for the following hypothesis:
Remember that the ANOVA table contains information about the explained and unexplained variation. Hence if the explained part increases sufficiently by including X, we would be able to say that the alternative hypothesis is true. One way to measure this increase would be to use the following ratio:
In the numerator of equation (5.8) we have the change in explained sum of squares divide by the degrees of freedom that come from including an additional variable in the regression model. Since this is a simple regression model the explained part goes from zero since no other variables are included and therefore the degrees of freedom equals one. Hence, the expression in the numerator is therefore simply the ESS. In the denominator we have the variance of the residual. It turns out that the ratio of the two components has a known distribution that is tractable to work with. That is:
Hence, we have a test function that is P-distributed with 1 and n-2 degrees of freedom.
Assume that we have a sample of 145 observations and that we would like to know if the random variable X has any effect on the dependent variable Y. In order to answer this question we form a simple regression model, and form the following hypothesis: H0 : Bx = 0 vs. Hx: Bx ^ 0 . Use the following information to perform the test:
ESS = 51190, RSS = 5232
In order to carry out the test, we form the test function and calculate the corresponding test value. Using (5.9) we receive:
With a significance level of 5 percent we receive the following upper critical value, F0025(,43) = 5.13, which is very much lower than the test value. Hence we can reject the null hypothesis and conclude that X has a significant effect on Y.
When using the ANOVA table to perform a test on the parameters of the model we call this the test of over all significance. In the simple regression model case it involves just one single parameter, but in the multiple variable case the test consider the joint hypothesis that all the included variables have a joint effect that is zero. We will speak more about this in the next chapter.
In the simple regression case the F-test corresponds to the simple t-test related to the slope coefficient. Butt how are these two test functions connected. To see this, we may rewrite the F-test in the following way:
The P-statistic in this case is nothing more than the square of the ^-statistic of the regression coefficient. Hence, the outcomes of the two procedures are always consistent.