Autocorrelation and diagnostics
Autocorrelation or serial correlation often appears when working with time series data. One should understand that in order for autocorrelation to appear it is necessary that observations are correlated over a sequential order. In statistical terms this could be expressed as:
Hence, autocorrelation is a problem that frequently appears when working with data that has a time dimension. This means that it is meaningless to look for autocorrelation when working with cross sectional data which usually are based on random samples from a population, at a given point in time. This should be obvious since cross sectional data has no natural ordering that could generate a correlation. If correlation is found anyway, one can be sure that it is a fluke and has nothing to do with any underlying process.
As an example, we could think of a random sample of individuals taken from a population to analyze their earnings. To find a correlation between two randomly chosen individuals in this sample is not very likely. However if we follow the same individual over time, the correlation between par wise observations will be a fact, since it is the earnings of the same individual, and observed earnings for a given individual does not change very much between short time intervals.
This chapter will discuss the most important issues related to autocorrelation that an applied researcher need to be aware of, such as its effect on the estimated parameters when ignored, how to detect it and how to solve the problem when present.
Definition and the nature of autocorrelation
An autocorrelated error term can take a range of different specifications to manifest a correlation between pair wise observations. The most basic form of autocorrelation is referred to as the first order autocorrelation and is specified in the following way:
where U refer to the error term of the population regression function. As can be seen from (10.2) the error term at period t is a function of it self in the previous time period t-1 times the coefficient, A which is referred to as the first order autocorrelation coefficient (This is the Greek letter rho, pronounced "row"). The last term V, is a so called a white noise error term, and suppose to be completely random. It is often assume to be standard normal.
This type of autocorrelation is called autoregression because the error term is a function of its past values. Since U is a function of it self one period back only, as appose to several periods, we call it the first order autoregression error scheme, which is denoted AR(1). This specification can be generalized to capture up to n terms. We would then refer to it as the nth order of autocorrelation and it would be specified like this:
The first order autocorrelation is maybe the most common type of autocorrelations and is for that reason the main target of our discussion. The autocorrelation can be positive or negative, and is related to the sign of the autocorrelation coefficient in (10.2). One way to find out whether the model suffer from autocorrelation and whether it is positive or negative is to plot the residual term against its own lagged value.
Figure 10.1 Scatter plots between et and et-1
Figure 10.1 present two plots that are two examples of how the plots could look like when the error term is autocorrelated. The graph to the left represents the case of a positive autocorrelation with a coefficient equal to 0.3. A regression line is also fitted to the dots in order to make it easier to see in what direction the correlation drives. Sometimes you are exposed to plots where the dependent variable or the residual term is followed over time. However, when the correlation is below 0.5 in absolute terms, it might be difficult to identify any pattern using those plots, and therefore the plots above are preferable.