Simulation, Estimation, and Goodness of Fit
Johan Koskinen and Tom Snijders
Exploring and Relating Model to Data in Practice
Previous chapters concentrated on the formulation and specification of exponential random graph models (ERGMs) for different types of relational data. In Chapter 6, we saw that effects represented by configurations and corresponding parameters define a distribution of graphs where the probability of getting any particular graph depends on the configurations in the graph. Chapter 7 showed that configurations are sufficient information in the sense that the probability of a graph is completely determined by statistics that are the counts of relevant configurations.
If we increase the strength of a parameter for a given configuration, graphs with more of that configuration become more likely in the resulting distribution. This simple fact is used in the three methods presented in this chapter:
Simulation: For a given model by fixing parameter values, it is possible to examine the features of graphs in the distribution through simulation to gain insight into the outcomes of the model.
Estimation: Empirically, for a given model and a given dataset, it is possible to estimate the parameter values that are most likely to have generated the observed graph, the “maximum likelihood estimates (MLEs).” Furthermore, it can be shown that the observed graph is central in the distribution of graphs determined by these estimates-but as we will see, because of the dependencies in the data, MLE requires simulation procedures.
Heuristic goodness of fit (GOF): For a fitted model (i.e., with parameters estimated from data), it is then possible to simulate the distribution of graphs to see whether other features of the data (i.e., nonfitted effects) are central or extreme in the distribution. If a graph feature is not extreme, there is no evidence to suggest that it may not have arisen from processes implicit in this model, and hence we can say that the model can explain that particular feature of the data - in other words, that such a feature is well fitted by the model.
To illustrate the basic principle of estimation and how this relies on simulation, consider an example. Suppose we want to estimate parameters for a model with edges and alternating triangle (as defined in Equation (6.8) and Figure 6.9a; X = 2) parameters for Kapferer’s (1972) tailor shop data. The observed number of edges is 223, and the alternating triangle statistic is 406.4. In Figure 12.1, we try a few values for the edge and alternating triangle parameters. In the top left-hand chart, we have simulated graphs when both parameters are 0, and so have distributions from those graphs of the statistics for edges and alternating triangles. We see that the observed configurations are far from the distribution, so it would be unlikely to observe the data if these values were true. In the top right-hand chart, a decrease in the edge parameter (to -0.84) centers the distribution of edges over the observed number of edges but does not produce enough alternating triangles. If we then increase the alternating triangle parameter (to 0.029), we get closer to the observed number of alternating triangles but do not reproduce the number of edges (bottom left hand chart). For the parameters in the right-hand bottom chart (the edge and alternating triangle parameters equal to -4.413 and 1.45, respectively), however, the distribution of edges and alternating triangles are centered over the observed values. These parameter values determine a model that adequately represents data, at least on these two statistics.