# Example: Factor Model

Fama and French (1993) suggested a three-factor model to explain the expected stock return premium required by investors. The three factors are

• The excess return of the market portfolio (RTOt"r/t);

• The difference between the expected returns on portfolios of small and large firms * (SMBt); *the small and large stock portfolios include all stocks with market capitalization in the lower and upper deciles of the sample median;

• The difference between the expected returns on portfolios of stocks with high and low book-to-market ratios **(HMLt).**

Thus, the expected excess return of stock * i *can be represented as

As an example we consider monthly returns on IBM stocks for the period January 1990 to September 2007. This data with Fama-French factors is available as IBM1.xls. Variables in the data sets are

* • ibm *- monthly returns on IBM stocks;

* • Mkt *- monthly returns on the market index;

* • rf *- monthly rate of the risk-free rate;

* • SMB *and

*- Fama-French size and book-to-market risk factors, respectively.*

**HML**In order to estimate the relation (2.2.5) we have to construct excess returns on IBM stocks and market portfolio. In EViews they can be created using

There are two ways of estimating linear regression in EViews. The first one, and more powerful, is through the main menu Quick/Estimate Equation. In the Equation specification window type the equation to be estimated. Using arithmetic operation we can specify the equation as

Note that coefficients of the equation should always be in the form C(1), C(2), etc. However, if the model is linear, it is more common to omit operation and coefficients signs and write

Note, that in the latter specification the dependent variable should be on the first place. The term C indicates that we are estimating the model with intercept; if it is omitted, the regression will be estimated without the intercept term.

**Figure 2.1: Regression estimation dialog window**

In the **Estimated setting window make sure LS — Least Squares (NLS and ARMA)** is chosen. The Sample window allows to estimate the model for different subsamples. This option subsample is specified in the same way as in the Sample object. Press OK and the regression output appears on the screen.

Another way of estimating a linear regression model is through the command line. To create an equation object use the declaration command equation following by a name of the object and the estimation type command (Is in our case stands for least squares) separated by the dot. Finally one should specify the model in the same way as above

**Estimation Output** The regression estimation output looks as follows The estimated coefficients of the model are given in the column Coefficients (the coefficient in front of C denote estimate of the intercept term). Slope coefficients denote the sensitivities of the returns on the stock to the three factors and show the impact of systematic factors on returns. In column t-statistic, the value of the test statistic is provided to test that the hypothesis * fii = *0. All the coefficients are highly statistically significant as indicated by low p-values (column Prob). The

**Figure 2.2: Regression estimation output**

overall significance of the regression is reflected in the value of F-statistic which is high enough to reject the null hypothesis of insignificance of all slope coefficients (p-value is given in Prob (F-statistic).

The proportion of the variance * Rit *explained by the variability in the market index is the usual regression R2 statistic and 1 —R2 is the proportion of the variability of

*that is due to firm specific factors. The proportion of market specific risk is R2 = 0.37 and the proportion of firm specific risk is 1 — R2 = 0.63.*

**Rn**By estimating the regression model, EViews produces an object Equation, which can be saved and used later on (press Name button in the top of the equation window). As each object in EViews, Equation can be represented in different views. View/Representation view contains the equation specification of the model, View/Estimation Output provides the familiar model output. View/Actual, Fitted, Residuals creates various plots of the estimated residual series, as well as fitted values of the dependent variable. Residual series is automatically stored in the series object resid which created by EViews in each workfile. Note, that resid contains residuals of the last estimated model and will be lost once the model is reestimated. Thus, residual series has to be saved for further use, if necessary. This can done by copying the residual series into a new object

Now, the residuals from the CAPM regression for IBM stock returns are stored in the new series object resid_ibm.

Besides the standard errors of the coefficient estimators, given in the output window, one can retrieve the whole variance-covariance matrix by clicking on View/Covariance Matrix.

Residuals Diagnostic Before drawing any conclusions from the estimated regression, it is necessary to perform residual diagnostic to make sure that the assumptions of the classic linear regression model are satisfied. This can be done in the section View/Residual Tests. Correlogram - Q-statistic provides values of the Box-Ljung statistics to test the significance of autocorrelations of residuals. The correlogram of the residuals from the factor model is given in Figure?

**Low p-values indicate** absence of serial autocorrelations up to lag 10. Another way to test for series correlation is to perform Breusch-Godfrey Test - in EViews this can be done through Serial Correlation LM Test. In the upper panel of the Breusch-Godfrey test output there are two versions of the test statistic which are asymptotically equivalent. Their p-values both confirm the absence of series autocorrelation up to the second order. The no-autocorrelation null hypothesis is also not rejected by the Durbin-Watson test; test statistic is given in the regression output is equal to 1.959 which is in the acceptance region.

The option Histogram - Normality Test builds the histogram of the residuals, their descriptive statistics as well as the value of the Jarque-Bera statistic.

**Figure 2.3: Correlogram of residuals from the factor model for IBM stock returns**

Notice that the Jarque-Bera statistic indicates that the residuals from the CAPM regression are not normally distributed. Note that even the residuals are not normally distributed, the inference is still correct asymptotically.

EViews also provides a number of test to test the hypothesis of homoscedasticity on the regression. Under the option Heteroscedasticity Tests... one can choose among Breush-Godfrey-Pagan, Harvey, Glejser, ARCH and White tests.

Three of them - Breush-Godfrey-Pagan, Glejser and White tests reject the hypothesis of homoscedasticity while Harvey and ARCH test do not reject the null. The reason for using several tests is that there are many different possible alternatives for the form of heteroscedasticity.

All the tests for autocorrelations and heteroscedasticity can be performed through the command line as well.

For the Breusch-Godfrey test for serial correlation we should specify the name of the regression equation we need to test and then the command auto (lags) where * lags *corresponds to the order of autocorrelation being tested. For example

will perform the test for second order autocorrelation in the factor model for IBM stock.

To perform heteroscedasticity tests we should specify the equation name followed by the command hottest (* options). *In the

*field we can specify the test being performed in the following way:*

**options***where*

**type=keyword,***is either "BPG"*

**keyword**(Breusch-Pagan-Godfrey - default), "Harvey", "Glejser", "ARCH", or "White". Inclusion of the command c in the options will lead to inclusion of cross-product terms in the auxiliary regression specification. Optionally, a list of variables may follow the command to include them into auxiliary regression as well. For example,

will perform the WHite's test for heteroscedasticity for the ibm_eq equation.

Since the exact form of heteroscedasticity is not known, it is not clear how to perform GLS estimator is this case. EViews allows to compute heteroskedasticity consistent as well as heteroskedasticity and autocorrelation consistent coefficient co-variance matrices. In order to compute them, click on Estimate button in the object menu and choose the Options tab. Tick the box in front of Heteroscedasticity consistent coefficient matrix to activate the option. Click OK to reestimate the model.

**Figure 2.4: Regression output with White's heteroscedasticity adjusted standard errors**

All coefficients remain still statistically significant using White's heteroscedasticity consistent standard errors.

Stability tests Finally, we can test the model for coefficients stability and structural breaks. In EViews this can be performed under the option Views/Stability tests. With Ramsey RESET test for model misspecification we cannot reject the null hypothesis of the correct specification (p-value 0.3754).

We start stability tests with the recursive residuals tests as they can help us to detect visually potential breakpoints. Click on Recursive Estimates (OLS only) and choose Recursive residuals. EViews will produce the plot of recursively estimated residuals from the model together with their confidence intervals.

**Figure 2.5: Recursively estimated residuals and their confidence bounds**

Majority of the recursive residuals are within their confidence intervals however there are several outliers spraying out their bounds. These are potential points for the structural breaks in the models. The CUSUM test does not indicate any potential breakpoints, however the CUSUM squared test suggests that there may be some breaks in sixties and at the beginning of 2000.

**Figure 2.6: CUSUM squared statistics and its confidence bounds**

We can go further and test whether there is a structural break at the specified dates using the Chow test. The p-value of the F-statistic for the Chow test is 0.3090 at the breakpoint January 1961 indicating no structural break at that date. However, if the breakpoint is specified at January 2000, we reject the null hypothesis of the parameters constancy at 1% significance level. Structural breaks may occur in the model due to some misspecifications. For example, from January 2000 there is one missing factor in the model which plays important role in explaining stock returns. The breakpoint data also corresponds to the dot.com bubble period where the classic factors model structure may change. In order to verify our hypothesis, we can include dummy variable corresponding to the bubble period to eliminate the effect from the model. To create the dummy which is equal to 1 for the period from January 2000 to December 2001, we write

series dummy=0

smpl 2000M01 2001M12

series dummy=1 smpl @all

Since structural breaks may occur in all the parameters, we include the dummy variable interacting with all regressors. Thus, in the Estimate equation box we have to specify a new model

As a result, the coefficient for interacting term with market portfolio returns and SMB factor is insignificant, however the interacting term with HML factor is statically significant at 10% level.

**Figure 2.7: Output of the regression estimation with dummy variables**

Correcting of misspecification also helps to improve properties of residuals. After introducing dummy variable all tests for heteroscedasticity indicate either no heteroscedasticity or produce some marginally significant p-values.

**Testing linear restrictions** EViews makes it possible to test hypothesis on the coefficient restrictions by means of Wald test. Consider testing the joint null hypothesis * Pi *= 1 and

*This hypothesis imposes two linear restrictions on the parameter vector. In the View option of the object menu choose Coefficient Tests/Wald — Coefficient Restrictions.... Type the restrictions to be tested in the box. Note that coefficients of the model are denoted by C(1), C(2), etc. In order to find out the exact notations of the parameters, go to View/Representation. P-value of the Wald test statistic is higher than any reasonable significance levels so we do not reject the null hypothesis of the validity of the restrictions.*

**(32 = P3.**The Wald test can also be performed in the command line. One should first specify the name of equation being tested followed by dot and command wald. Specifications of the restrictions follows separated by commas.

**Predictions** After having estimated the regression, often our aim is to construct forecast of the dependent variable. EViews' forecast function can be invoked through Forecast option in the menu of the equation object. In the box Forecast name type the name of the variable where the regression forecasts will be stored. EViews will automatically create a new series object with the specified name and plot the predicted series with two confidence region bounds.

**Alternatively**, one can view the forecast by double-click on the forecast variable. In the menu option View choose Graph where the required graph type can be generated.

In order to generate forecasts through the command line, use the command fit followed by a name of a series variable where the forecast values should be stored

EViews also allows to generate standard errors of the forecasts along with the predictions themselves. To do this, simply include a name of another variable at the end of the line.

# Programming Example

Note that the factor sensitivities of stock (portfolio) returns represented by the estimated coefficients vary through time. As the model is estimated for alternative sample periods, the estimated coefficients will change. A useful analogy is the value of a stock's beta that varies through time based on the sample period data used to estimate the security market line.

The estimated factor model (2.2.5) for IBM uses all of the data over the 57 year period from January 1950 to September 2007. It is generally thought that coefficients do not stay constant over such a long time period. To take into account this fact while building returns forecast based on the factor model, we can perform rolling window regression. We start with initializing necessary variables (e.g., number of observation in the workfile, length of the window). For this purpose we make sure that the current sample is set to the whole range of the data. Type the following commands in a new program window:

Next, we create new object we will be using in the program - series of the forecasts and an equation object.

In the next lines we specify a loop where we reset the current sample to the estimation window and roll it across the data range. For each of the subsamples we estimate the factor model.

Once the model is estimated we reset the sample to a subsample where we want

to forecast returns. Since we build one-step-ahead forecast, our new subsample will be just one observation ahead the estimation window.

We generate the forecast using the estimated model. To access the values of the estimated parameters we use EViews function @coefs. In parentheses we specify the order of the parameter - it corresponds to the order of respective variable in the regression model. Note that @coefs contains the values of the last estimated model. Once the equation is re-estimated, the new values of parameters are stored in @coefs.

Just to tidy up the workfile we delete auxiliary variables window and n. delete window n

The series ibm_exf contains the generated forecast from the rolling window model. Similarly to the use of @coefs function, one can access other OLS statistics. The specifications are given in Table 2.2.

**Table 2.2: Equation Data Members**

## Nonlinear Regression

In many cases the relation between variables can happen to be nonlinear. If such model cannot be transformed into a linear one, we call such model intrinsically nonlinear regression model.

We can represent such model in the following way

where * F *is a non-linear function, where

*= [1*

**Xi***is a*

**X2i Xki'***1 vector of explanatory variables, and*

**k x***is a random error term. The least squares estimation problem to minimize*

**ui**becomes non-linear. The first order conditions are given by

This gives a set of non-linear normal equations in 9. The non-linear least squares (NLS) estimator * 9NLS *is defined as the minimizing value of (2.3.1).

In EViews, the Nonlinear Least Squares method has the same implementation as the OLS. The only difference as that the model in the Equation specification box should be entered as a mathematical expression instead of a list of variables. for example,

Interpretation of the estimation output, residual diagnostic and inference can be performed in the same way as for the OLS regression.