Clustering the US counties with a spatial relational constraint

The selected 42 variables, discriminating the nations in terms of the counties placed in them, sets a foundation for the clustering with relational constraints. Some decisions were needed to obtain a clustering according to the selected variables and contiguity relation. This was done

The first (height) dendrogram for clustering with a relational constraint.

in a straightforward fashion: 1) the variables were standardized; 2) the Euclidean distances between directly linked counties were computed; 3) the tolerant strategy (where each cluster induces a connected subgraph) was adopted; and 4) the maximum dissimilarity described in Section 9.3.2 was used for clustering. The first dendrogram for the constrained clustering is shown in Figure 9.11. It shows a very clear partition into two large clusters. This partition is shown in Figure 9.12. It is considered first in Section 9.5.1. We consider this partition first as an illustration before considering a partition with k = 8, the number of Garreau's nations in the USA. In drawing these images of partitioned counties, we fill areas for each of the counties to match the layouts of Figures 9.2 and 9.3.

The eight Garreau nations in the USA

Without surprise, the clear partition of the US counties into two clusters does not match fully Garreau's image of the USA. However, these two areas do have some correspondence with his eight nations in the USA. The area shaded in light gray includes New England completely, and virtually all of The Foundry. It includes most of Dixie, the southern tip of Florida, and most of MexAmerica. However, it includes some contiguous areas within The Breadbasket and some of The Empty Quarter. The area marked in dark gray contains all of Ecotopia, virtually all of The Empty Quarter, and most of The Breadbasket. There are contiguous parts of the dark gray region in two areas within Garreau's Dixie. The dark gray region is mainly in the northwest and the west coast of the USA. In brief, the light gray area contains the east coast, most of the south, and a swath of counties stretching to the west along the border.

The spatial clustering with two clusters of counties.

In order to have a clearer image of a more detailed partition of counties than is possible to discern from Figure 9.11, ranks can be used instead as clustering levels to magnify differences between clusters. The resulting dendrogram is shown in Figure 9.13. There are choices regarding the number of clusters to select when reading a dendrogram. We opted to consider two in detail. One has eight clusters while the other has 15 clusters. The clustering into the largest clusters (with smaller clusters all shaded in plum) is shown in Figure 9.14. There are also several clusters with only one county (singletons) that are shaded in white. This partition was chosen to facilitate a better comparison with Garreau's eight nations within the USA.

While there are some correspondences between the partitions shown in this figure and Garreau's image of the USA, the match is not a very good one. There are eight clusters shown in colors. Again, there are some counties, shown in white, not belonging to any cluster because they are highly distinctive in their immediate spatial context of adjacent tables. This introduces some patchiness consistent with Chinni and Gimpel's analysis but inconsistent with Garreau's claimed homogeneous regions.

The large green area on the right (east) of the map does contain New England and the Foundry, which is only partially consistent with Garreau. The Foundry is more diverse than Garreau's narrative suggests. Further, the area marked in green also contains a large area to the south and west, incorporating much of Dixie and part of MexAmerica. It also stretches further west into The Breadbasket. The pink area on the right contains parts of the Foundry and Dixie and has a subarea more akin to the Breadbasket region.

While the large orange area contains much of the Breadbasket, it stretches west into The Empty Quarter and south into Dixie. The dark blue area in Alabama and Georgia is contained within Dixie. The disconnected region(s) marked in purple have parts in The Breadbasket, Dixie, and MexAmerica. Clearly, the monolithic Breadbasket and Dixie regions depicted by Garreau are much more diverse than his account suggests.

A second (rank) dendrogram for clustering with a relational constraint.

There is a light blue area depicted in the south western area of the USA. While it corresponds to the MexAmerica of Garreau's account, it does not have the green and purple areas in Texas but stretches further north into The Empty Quarter. Only part of Nevada is shown as being in the light blue area. The scattered counties marked in white were not classified into any of the eight clusters. They can be viewed as residuals.

Ecotopia is largely intact - the exception is the area around San Francisco - inside the area shown in yellow in Figure 9.14. However, the counties marked with yellow stretch east into The Empty Quarter and also into the northwestern part of MexAmerica. At a minimum, the area called the Empty Quarter is better seen as having at least two distinct parts. Much of MexAmerica is intact. Of Garreau's nations, only New England in the northeast of the USA and Ecotopia (with the exception of the area around San Francisco) on the west coast of the USA remain intact when clustering using relational constraints is used.

These results suggest two things: 1) with the exception of New England and Ecotopia, the monolithic nations in Garreau's are far more heterogeneous than his simple classification into nations suggests, and 2) combining systematic county attribute data with attention to geographic adjacency makes a huge difference in the classification of counties. It is highly unlikely that the changes in the USA from 1980 to 2000 are sufficient to account for these differences. The results argue against both the large aggregations of counties into the nations of Garreau and the patchwork account of Chinni and Gimpel, by returning a partition between the two. Of course, this is not surprising. Garreau's account is rich in anecdotal detail and items of local history. However, it is unlikely that Garreau visited all 3111 counties of the USA. The result appears to be the inclusion of some counties into nations using details of other counties in roughly the same area. One surprise with the discriminant analyses was

The spatial clustering with eight clusters of counties.

how few of the 42 variables drove the seven discriminant functions. It seems that including geographic contiguity, in addition to it playing a direct role in the partitioning, brought some of the other county attributes into play. In contrast to Chinni and Gimpel's analysis, the areas delimited here have greater geographic coherence. The result is fewer regions rather than more patches.

