# Spatial regression models

Over recent decades, the French dairy sector has gone through tremendous structural change. Since the introduction of the milk quotas, the number of dairy farms has shown a strong decline, while the average size of a dairy farm has increased substantially. One of the major justifications for the introduction of milk quotas in 1984 was to preserve the small family dairy farm throughout the European Community. Such farms needed high support prices to maintain their incomes; the only way to prevent the larger farms from increasing their output to take advantage of these prices and thus to expand was to impose quantitative restrictions. However, this policy has done little to preserve the family farm structure of European dairying. Dairy farms have continued to grow larger, small farms have continued to go out of business, and the large operators have grown to dominate with an even larger share of total output. In addition, quota restrictions sought to freeze the geographic distribution of dairy production by *dipartement* as it was in 1983. As a consequence, dairy production has been sustained in mountainous zones and less competitive areas. However, at the same time, a concentration of production has occurred in some productive regions, yet to a lesser degree than in non-EU countries that do not have dairy quotas.

Following Cliff and Ord (1981), the spatial econometric literature has developed a large number of methods that can address spatial heterogeneity and dependence by specifying a spatially lagged dependent variable (spatial autoregressive model), or by modelling the error structure (autoregressive disturbance model). In the absence of spatial autocorrelation, different methods such as instrumental variables or maximum likelihood can be used to estimate models with endogenous variables. However, the presence of endogeneity in a spatial context has generally been ignored. “As a consequence, researchers have often been in the undesirable position of having to choose between modelling spatial interactions ignoring feedback simultaneity, or accounting for endogeneity but losing the advantages of a spatial econometric approach” (Rey and Boarnet, 2004).

We use the Kelejian and Prucha (2007) method, which enables us to analyze both endogeneity and simultaneous spatial interactions. This method makes it possible to develop a non-parametric heteroskedasticity and autocorrelation consistent (HAC) estimator of the parameter variance-covariance (VC) matrix, namely SHAC within a spatial context.

Consider the following model where we distinguish between exogenous (X) and endogenous (Y) variables as well as spatial autocorrelation *(Wy):*

where у is the *(n x* 1) vector of observations on the dependent variables; *X* is a *(n x k*) matrix of observations on k exogenous variables with *b* as the corresponding *(k x* 1) vector of parameters; *Y* is *(n x r*) matrix of endogenous variables with *у* as the corresponding *(r x* 1) vector of parameters; and *u* is *(n x* 1) the vector of error terms; *p* is

the scalar spatial autoregressive parameter and *W* is a *(n x n*) spatial weight matrix of

*

known constants with a zero diagonal. An element *w _{ij}-* of the matrix describes the link

between an observation in location *i* and an observation in location *j*, and so the *W* matrix represents the strength of spatial interaction between locations. We first use the first-order spatial contiguity matrix. However, contiguity matrices appear restrictive in terms of their spatial connection definition (Cliff and Ord, 1981). Therefore, we also use a geographical distance function, as most empirical studies have done (Fingleton, 1999, 2000; Le Gallo, 2002; Le Gallo et al. 2003; Rey and Boarnet, 2004), defined as:

where *dy* is the great-circle distance between the centroids of locations *i* and j, and *D ^{1}* is the critical cut-off. In our application,

*D*is equal to 115 km since this is the minimum distance that guarantees connections between all

*departements.*Each matrix is row- standardized so that it is relative and not absolute distance that matters. The main advantage of using the geographical distance-based weights is that they can be considered as exogenous to the model and as a good proxy of transport cost (Arbia

*et al.*2009). However, these matrices assume equal importance to all

*departements*located at the same geographical distance without taking into account the economic potential of the

*departements*or their accessibility (Keilbach, 2000; Dall’Erba, 2004; Virol, 2006). To better represent real spatial interactions, we also use time-road distance where the cut-off

*D*is 90 minutes. This weight matrix takes into account accessibility in terms of the time needed to go from one location to another, and also the road infrastructure between locations. However, because infrastructure may change (e.g. new highways) the time weight matrix can also change over time. However, because of lack of data, the same weight matrix - that of the year 2007

^{8}- is considered for the model for the year 2005 and that for the year 1995.

The asymptotic distribution of the instrumental variable estimators of the parameters in (5) depends critically on the quantity: ? = *n~ ^{l}H*'

*LH*, where L =

*E*(cy) denotes the

variance-covariance matrix of *u* and *H* is the full matrix of instruments. Following the Kelejian and Prucha (2007) SHAC estimator, the (*r*, s*) ^{th}* estimated element of

*?*is:

where *K* (*x*) is the Kernel density function. In this study, we use the Parzen-Kernel density function as given by Andrews (1991):

in which x= *d**j* / *d*_{max},; *d**j* is the distance between location *i* and location *j;* and d* _{max}* is the bandwidth.