ERGMs from Generalized Linear Model Perspective
Although ERGMs have some similarities to generalized linear models (GLMs), especially standard log-linear models and logistic regressions, an ERGM does not (except in trivial cases) reduce to logistic regression. Yet, because some fundamental concepts are common to GLMs and ERGMs, it is instructive to illustrate these in the familiar language of logistic regression. This also serves to emphasize how the main departure of ERGMs from logistic regression, namely, the assumptions of dependence between observations, play out. Although logistic regression assumes independence of observations as in Table 6.1, ERGMs do not make this assumption, rather the opposite.
Suppose that we are primarily interested in explaining observed ties as a function of a collection of covariates, or predictor variables. The covariates for the tie-variable Xij could, for example, relate to the individual characteristics of the two actors i and j, such as the difference in age between i and j, and a variable indicating whether i and j have the same gender. Denote these dyadic covariates by wi7j1, wij,2,..., wij,p, for p covariates. For a GLM, we would try to find a function n of w and unknown parameters 6fi, 02, •••, 0p that best describe the expected value E(Xij) = n(w, 0) (the probability that Xij = 1).For dichotomous response variables such as Xij, a logistic regression estimates a set 0 of unknown parameters 01,02,... 0p (logistic regression coefficients) that best predict the probability that the tie is present. The logistic regression function is
If a covariate, say, wij,2, indicated whether i and j were of the same gender, a positive value of the corresponding parameter 02 indicates a higher probability of a tie between people of the same gender. It is usually easier to interpret the model in terms of the logit or log-odds, which is the natural logarithm of Pr(Xij = 1|0)/Pr(Xij = 0|0):
Anyone familiar with linear and/or logistic regression will be comfortable with the expression on the right-hand side. The parameters (0) weight the relative importance of their corresponding predictors (w) for the probability of a tie. Positive parameters correspond to effects that increase the probability of a tie, whereas negative parameters relate to effects that decrease the probability of a tie.
The difference in the log-odds for two pairs (i, j) and (h,m), the covariates of whom only differ in that i and j are of the same gender (wij,2 = 1) and h and m are of different gender (whm,2 = 0), is
This ratio (of being of the same gender) is the well-known odds ratio. The larger the value of 02, the greater the probability of a tie for same-gender pairs as compared to different-gender pairs, everything else being equal. We can think of 02 as relating to the change in going from a situation of a different gender to a same-gender pair with everything else the same.
For ERGMs, in addition to the exogenous covariates used in logistic regression such as the w variables, we include as covariates counts of “network configurations” in the linear predictor. Configurations were introduced in Chapter 3, and examples include edges, 2-stars, and triangles (see Section 3.1.2). In addition, we provide details of other possible configurations later in this chapter. The interpretation of the parameters corresponding to these configurations is similar to those of exogenous covariates; for example, a positive parameter corresponding to the number of triangles means that a tie is more likely to occur if it closes a 2-path than if it does not. In the example of the triangle, as the reader will notice, whether a tie closes a 2-path depends on whether the other two ties of the triangle are present.
Consequently, the second departure from logistic regression is that we have to formulate the model for each tie-variable conditional on the rest of the graph: that is, in predicting a tie Xj, we need to take into account the other ties that might be present. In other words, ERGMs predict the probability for Хц, conditional on all other ties observed in the network (which we denote as Х_ц). This conditional probability is written as Pr( Xj = 1| X_ij = x_ij, 0). Leaving aside the dyadic covariates (the w variables previously mentioned), and concentrating only on the configuration counts as predictors, the (conditional) logit then becomes
The functions S+j k(x) are called the “change statistics” for the kth configuration. They are not just counts of the configurations in the graph (e.g. the number of triangles) but the change in going from a graph for which X_ij = x_ij and Xj = 0 to a graph for which X_ij = x_ij and Xj = 1. For example , if one covariate is the number of edges , then adding the edge (i,j) to X_ij = x_ij will result in an increase in the number of edges by one, say, Sjedge(x) = 1. Adding the edge (i, j) to X_ij = x_ij when xik = xkj = 1 will result in an increase in the number of triangles by
(at least) one because this would create a new triangle Xj = Xk = xkj = 1. If the parameter corresponding to the number of triangles is positive, then the fact that the triangle count increases would contribute positively to the probability for Xj = 1.
Note here the important fact that we need to know the rest of the graph X—ij = x—j in order to calculate the S+j k(x) and the conditional logits. This is a direct consequence of the assumption that ties may be interdependent - the probability of a tie depends on whether other ties are present. The probabilities (or probability distributions) presented in this chapter may be interpreted conditionally - an ERGM prescribes how likely it is to add or delete a tie for a pair of actors given everything else. These probabilities are based on the weighted contributions of changes in configurations that adding or deleting the tie in question would yield.
Why are the predictors the change statistics of configurations rather than the raw count of configurations? In Equation (6.1), we have an expression for the log-odds for the presence of a tie on Xj compared to the absence of a tie on Xij. In that case the correct predictor is the change from the graph , when Xj = 0 , to the graph , when Xj = 1.
There is an equivalent form of the model as a probability expression for all tie-variables simultaneously, where the predictors are then the counts of configurations. This is known as the joint form of the model:
Equation (6.2) is the general form of the ERGM that we stick to throughout the book. The functions zk(x) are counts of configurations in the graph x, such that the corresponding change statistic for zk(x) would be S+ k(x) = Zk( A+jx) — Zk( A—x), where A+x (A—x) denotes a matrix x for which Xj is constrained to be equal to one (zero). The parameters weight the relative importance of their respective configurations, and the normalizing term
К (0) = J2 yeX exp{0i Zi( y) + 02 Z2( y) +-----+ 0 pZp( y)} ensures that the sum
of the probability mass function, P0 (x), over all graphs is one.
Equation (6.2) describes a probability distribution for all graphs with n nodes. Let us suppose that we have only one configuration represented in a model for the network - the number of edges. Then there will be a parameter 01 for edges and a statistic Z1(x) that is simply the count L of the number of edges in the graph x. So, for any and every graph x with n nodes, Equation (6.2) with a given edge parameter 01 will assign a probability to x based on the number of edges. We can then think of a graph from this probability distribution as a random graph, and due to the form of Equation (6.2), we term it an “exponential (family) random graph distribution.” Because Equation (6.2) is based on certain network configurations, we can think of graphs in this distribution as built up by the presence and absence of those particular configurations, combining together in ways represented by the parameter values to create the total graph structure.
As the ERGM gives us a distribution of graphs over X, the model also implies a distribution of statistics. This offers a convenient way of studying various properties of a model through inspecting the various implied distributions of statistics (as is done in Chapter 4; the use of simulated distributions of statistics is described in more detail in Chapter 12 (simulation, estimation, and goodness of fit) and further illustrated in Chapter 13). As an example, we frequently make use of the expected values Exg{z(x)} = xeX z(x)Pg (x) of these implied distributions.
Of course, this description is still quite abstract. To obtain a particular model, we first need to decide which configurations are relevant. We are guided here by hypotheses about possible dependencies among tie- variables.