# Identification

In order to be able to estimate the structural equation coefficients they need to be identified. So, what do we mean by that? To give an intuitive feeling for its meaning we will give an example before going into any formal and mechanical tests.

Consider the following two equation system:

* Q = A0 + AXP + A2 Xx + *U (Supply) (12.14)

* Q = B0 + Bf + U2 *(Demand) (12.15)

This system contains two endogenous variables (P and Q), and one exogenous (X1) variable. These two equations represent a demand and supply system for a given market. The question is if any of these two equations are identified. That is to ask if the parameters of the two equations can be estimated consistently.

It turns out that the demand function is identified while the supply function is not. To see this, consider Figure 12.1. In order to identify the demand function we need some exogenous variation that could help us trace out the function. That could be done using the supply function. The supply function contains an exogenous variable * x1 *and the supply function takes a new position for each value of

*In that process we identify the demand function. But in the demand function we have nothing unique that does not appear in the supply function so it is impossible to move the demand function while holding the supply function fixed. Hence it is the presence of an exogenous variable in one equation that allows us to estimate the parameters of the other equation. If*

**x1.***had been included in both equations there would have been no unique variation in any of the equations and hence no equation had been identified. However, if another exogenous variable,*

**x1***had been introduced and placed in the demand function, we would receive some exogenous variation that could help us to identify the supply function. In that case both equations would have been identified.*

**x2,****Figure 12.1 Demand and supply system**

The process of identifying equations can be formalized in a decision rule that specify the conditions that have to be fulfilled in order to identify one or several equations in a system. In the literature two rules are described and one is slightly easier to use than the other.

## The order condition of identification

The first decision rule for identification is the so called order condition. This rule specifies the necessary conditions for identification and is the more popular one of the two rules that will be discussed. Unfortunately it is not a sufficient rule, which means that it is possible that the equation is undefined even though the order condition says it is identified. However, in a system with only two equations, the order condition will work well and can be trusted.

Define the following variables:

* M *= The number of endogenous variables in the model

* K *= The number of variables (endogenous and exogenous) in the model

**excluded**from the equation under consideration.

**The order condition states that:**

1) |f * k *=

*— 1 => The equation is exactly identified*

**m**2) If * k *>

*— 1 => The equation is over identified*

**m**3) If * k *<

*— 1 => The equation is under identified*

**m**When checking the order condition you have to do it for each equation in the system.

**Example 12.1**

Consider the following system:

Use the order condition to check if the equations are identified. In order to do that, we need to determine the value of * M *and K. This system contains two endogenous variables and the total number of variables, endogenous as well as exogenous, is four.

For the first equation we have M-1=1 and K=1 since * X2 *is excluded from (12.16). Since M-1=K we have that the first equation is

**exactly identified.**For the second equation we have M-1=1 and K=1 since X1 is excluded from (12.17). Since M-1=K we have that also the second equation is * exactly identified. *When all the equations of the model are identified we say that the model is identified since we are able to estimate all the structural parameters.

**Example 12.2**

Consider the following system:

In this example we have two endogenous variables and three exogenous variables with a total of five variables. * M*-1 will in this example equal 1 as before since we still have only two endogenous variables. Will the equations be identified in this case? The first equation contains four variables which means that one variable has been excluded from the equation, that is,

*does not appear in equation 1 and*

**X2***=1.*

**K**Since M-1=K the equation is **exactly identified.**

The second equation includes three variables which mean that two variables have been excluded. That is, X1 and X3 are not included in equation 2. That means that K=2, which means that M-1<K which leads to the conclusion that equation 2 is **over identified.**

## The rank condition of identification

The rank condition is slightly more complicated when dealing with larger systems of equations, but when using only two equations it is as easy as the order condition. The rank condition is a necessary and sufficient condition, which means that if we can identify the equations using the rank condition we can be sure that the equation really is identified. The rank condition investigates whether two or more equations are linearly dependent on each other, which would be the case if the sum of two equations would equal a third equation in the model. If that is the case it is impossible to identify all structural parameters. The basic steps in this decision rule is best described by an example.

**Example 12.3**

Consider the following system of equations:

This system contains three endogenous variables (Y1, Y2, Y3) and three exogenous variables (X1, X2, X3), which means that we in total has six variables. The first step in checking the rank condition is to put up a matrix that for each equation mark which of the six variables that are included (marked with 1) and which that are excluded (marked with 0) from the equation. For our system we receive the following matrix:

**Matrix for the rank condition**

In order to check the rank condition for the first equation we have to proceed as follows: Delete the first row and collect the columns for those variables of the first equation that were marked with zero. For equation 1, * y *and X2 was marked with zero, and if we collect those two columns we receive:

If this matrix contains less than M-1 rows or columns where all elements are zero, equation 1 will not be identified. * m *refers to the number of equation just as in the order condition, which means that M-1=2. Since we have two rows and two columns and none of them contains only zeros we conclude that equation 1 is identified.

For equation 2 we proceed in the same way. We delete the second row and collect those columns where the elements of the second row were marked with a zero. For equation 2 that was the case for Y3 and X1, which is to say that these two variables was not included in equation 2. The resulting matrix for this case then becomes:

It looks in the same way as for equation 1, which means that we have two rows and two columns that is not only zeros. The same procedure should be done for the third equation and if you do that you will see that it is identified as well.

When using larger systems it is quite possible that the order condition says that a particular equation is identified even though the rank condition says it is not. When that happens it might still be possible to generate estimates, but those estimates will not have any economic meaning since they will represent averages of those equations that are linear combinations of each other. Hence, you should not be content that you have received identified results just because the order condition says so and the econometric software generates results for you. When using systems of more than two equations you should also confirm the identification using the rank condition.