Simulation: Obtaining Distribution of Graphs for a Given ERGM
The charts in Figure 12.1 plot edges against alternating triangles for samples of graphs from different distributions. Because of the vast number of possible graphs, we cannot plot the exact distribution, nor can we calculate properties of these distributions analytically. We have to rely on a sample of graphs. Sampling graphs is very central to ERGM inference.
Sampling Graphs Using Markov Chain Monte Carlo
To sample a graph x, we rely on the well-known procedure Markov chain Monte Carlo (MCMC), which here consists of generating a sequence of M graphs where graphs are successively updated through small changes. For large M, the last graph in the sequence is a draw from the target
Figure 12.1. Distribution of edges and alternating triangles (mean represented by triangle) for different values of parameters compared to observed number of edges and alternating triangles (circle) for Kapferer’s (1972) data.
Figure 12.2. Number of edges in sequence of graphs in Markov chain starting with empty graph.
distribution - in our case, the ERGM. In particular, we may use the Metropolis-Hastings algorithm (Chib & Greenberg, 1995; Hastings, 1970; Metropolis et al., 1953; Tierney, 1994) to sample from the ERGM.
The updating rule in moving from the current (old) graph to a new graph in the simulation consists of choosing a pair of nodes at random and removing or adding a tie between them according to whether they are already tied. If the new graph with one changed tie has a higher probability than the old graph, the new graph is taken as the next graph in the sequence. If the new graph has a lower probability and is less likely than the old one, we only sometimes accept the proposed change to the graph, with a probability that depends on the ratio of how much more likely the old graph is than the new one. In the event that the new graph is not accepted, the next graph in the sequence remains the old graph.
Figure 12.2 illustrates the case of sampling from a model with edges and alternating triangles, with edge and alternating triangle parameters —2 and 0.3 respectively, on twenty nodes. Starting with the empty graph (with zero edges and zero alternating triangles), after 100 updates of the Markov chain, we have reached a graph with roughly ten edges. After around 500 iterations, graphs of between twenty and forty edges are being proposed. Between 1,000 and 10,000 iterations, the chain appears to have settled into a pattern, generating graphs with an average of about
Figure 12.3. Number of edges against alternating triangles for sequence of graphs in Markov chain starting in empty graph. Model with edge and alternating triangle parameters equal to —2 and 0.3 respectively, n = 20.
thirty edges. We may tentatively conclude that the chain has “converged” into drawing graphs from a stationary distribution, and we may take the last graph in the sequence as our sample point. For the two statistics in Figure 12.3, we see clearly how the chain settles and starts filling in the area it is exploring.
The rationale behind the MCMC sampler is that if the current state is a draw from the target distribution, so will be the next state. That is, once we have started producing graphs from the ERGM, we will continue to do so. Determining how many iterations are necessary before the simulation has settled into the target distribution has to be decided on a case-by-case basis. The number of such iterations is commonly referred to as the “burn-in.” From Figure 12.2, as far as edges are concerned, the simulation quickly burns in within a few hundred iterations. The burn-in is needed for the Markov chain to “forget” the state in which it started. In principle, as long as the burn-in is long enough, the choice of starting graph (empty, complete, or any arbitrary graph) does not matter.