# Geospatial Effects

It has been widely recognized that the presence of a network tie between two individuals might be conditionally dependent on the physical distance between the geographic locations of the corresponding individuals. Spatial embedding of networks may be more complex than simple physical pairwise distance between actors. For instance, there may be a number of possible effects that a researcher may like to consider: closure may be specific to a particular area, distance from centroids may be an alternative to dyadic physical distance, natural spatial barriers or only nearest neighbors may need to be considered, and so on.

To extend ERGM to take into account the geographic arrangement of individuals, let us assume that each individual occupies a particular location in a physical space. This location is fixed and defined in a particular coordinate system (e.g., latitude and longitude coordinates). The preceding interactions between geographic and social space may be incorporated in the ERGM; however, for this initial treatment, we focus on dyadic distance. Based on the spatial locations, we can derive continuous distance variables for all possible physical distances among pairs of individuals in the network. In this form, the geographic arrangement of individuals acts as a dyadic covariate of pairwise distances, but there are a number of challenges associated with employing this approach.

First, a decision should be made regarding a distance measure. Distance between a pair of individuals can be calculated in a number of different ways, and different distance measures may be useful in different contexts. Euclidean distance is commonly used when the focus is on distances in a two-dimensional physical space. For large distances embedded in spherical space (e.g., the Earthâ€™s surface), the associated (Riemannian) arc distance could even be calculated.

Because the distance between spatial locations of any nodes *i* and *j* is a continuous measure, the second challenge is a choice of distance interaction function (Besag, 1974; Robins, Elliot & Pattison, 2001). The distance interaction function determines a functional relationship between a probability of a network tie between any two nodes and distance between the geographic locations of the corresponding nodes. A wide range of functions can be considered, and a simple curve-fitting exercise can be employed to determine an appropriate functional form. In practice, it is useful to restrict attention to parametric functions with some properties of interest. Butts (2012) defined four such properties: (1) monotonic versus nonmonotonic behavior, (2) behavior of the function when distance between spatial locations equals zero parameterized by a tie probability at the origin, (3) behavior of the function at small distances parameterized by a curvature near the origin, and (4) behavior of the function at large distances parameterized by a tail weight. Interested readers should consult Butts for a detailed description on these properties and corresponding sets of parametric forms. The most commonly used families of spatial interaction functions are exponential decay, *e*^{-ad}, and inverse power law functions, _{1} ^ (Butts, 2002; Daraganova et al., 2012; Kleinberg, 2000; Latane, 1996). Although these functions are similar in their general model behavior, the main distinction is in tie probabilities at large distances and hence the functional form of the tail. For exponential decay functions, a tie probability at large distances becomes virtually zero beyond some given point, whereas in power law functions a tie probability at large distances is nonnegligible.

A challenge in the ERGM approach is to transform these functions suitably to exponential family form. Although it is quite simple to convert exponential decay functions, the explicit transformation of decreasing power law functions into the ERGM framework leads to a curved exponential family graph model and would require estimation of nonlinear parameters.

Daraganova et al. (2012) showed that in a simple case, when a tie probability between nodes *i* and *j* is conditionally independent of other ties in the graph given the distance between *i* and *j,* the general inverse power law function can be expressed as a natural logarithm of distance, logd. Although it is always possible that the inclusion of more complex dependence assumptions may lead to changes in the relationship between tie probability and distance, for exploratory purposes one can make a logarithmic transformation of distance for all models and use it as a dyadic covariate in the ERGM formulation.

Extra caution should be given both to zero distances and to very extreme distances. There are two reasons. First, these distance values might lead to instability of model fitting. Second, from theoretical perspectives, different processes may be at work at different scales, and it may be that geospatial effects differ between very small distances, medium distances, and very long distances. There are two possible approaches to deal with zero and very extreme distances. In the first case, zero distances can be transformed to the smallest possible distance in the data set (Preciado et al., 2012), whereas extremely large distances can be disregarded (Daraganova et al., 2012). Daraganova et al. observed that there is a threshold distance after which there are occasional ties, at irregular intervals, and the rate at which these ties occur is seemingly without any dependence on distance. Therefore, it seems to be theoretically appropriate to fit data only for all observations at a distance no greater than the threshold. A second approach might be to derive two dummy variables to indicate zero (or near-zero) and extreme distances.

Hence, to take into account the geographic arrangement of individuals, a researcher should decide on a distance measure between spatial locations and the nature of the distance interaction function that relates distance to marginal tie probability. These decisions are of crucial importance because they determine the form of pairwise distances represented by a dyadic covariate in the ERGMs framework.

# Conclusion

The incorporation of actor-attribute and dyadic covariate parameters is a natural and important element in ERGMs, either as parameters of direct research interest or relating to effects that need to be controlled. When specifying an ERGM for empirical data, any available and relevant actor or dyadic information should be parameterized into the model.