# Tests for Clustering

Analysts searching for hot spots or high-crime areas can test for clusters of points, lines, or polygons. There are at least two methods to test for clustering: the nearest-neighbor index (NNI) and the test for spatial autocorrelation:

- 1
*Nearest-neighbor index:*Test that compares the actual distribution of crime data with a randomly distributed data set of the same sample size (Eck et al., 2005). For both the actual and randomly distributed data sets, distances are calculated between a point and its nearest neighbor. The process is repeated for all of the points. The average distance is then calculated for both the actual and randomly distributed sets. The NNI is the ratio between the average distance for the actual data set and that for the random data set. Overall, the results of the NNI test examine whether points are closer than expected under spatial randomness (Eck et al., 2005); one limitation, though, is that this test does not directly point out where clusters are, but instead answers only the question as towhether they exist. Some computer programs,such as CrimeStat III (Levine, 2010), allow users to perform hierarchical clustering where analysts can search for clusters of clusters based on nearest neighbors. Analysts first identify initial clusters using ellipses (i.e., first-order clusters). After initial clusters are identified, hierarchical clustering then attempts to identify clusters of clusters (i.e., second-order clusters) (Eck et al., 2005; Paynich and Hill, 2010).This is done until “all crime points fall into a single cluster or when the grouping criteria fails” (Eck et al., 2005, p. 22). - 2
*Test for spatial autocorrelation:*Spatial autocorrelation is another term for spatial dependency (Chainey and Ratcliffe, 2005). Spatial autocorrelation techniques assume that “criminal events that occur in different locations (yet in close proximity') are related” (Paynich and Hill, 2010, p. 382). Positive spatial autocorrelation suggests that areas with high crime rates are clustered together, and areas with low crime rates are clustered together (Eck et al., 2005; Paynich and Hill, 2010).

There are several tests for spatial autocorrelation. Morans I and Geary’s C are spatial autocorrelation statistics that require aggregate data (Eck et al., 2005; Paynich and Hill, 2010).

- •
*Moran’s I:*This is a global statistic that shows whether the pattern is clustered, dispersed, or random (ESRI, 2009). An intensity value is assigned to each aggregate point and requires some variation in the values for this statistic to be computed. Points that have similar values are reflected in high Moran’s I values (positive or negative) (Eck et al., 2005). A Moran’s I closer to +1 indicates clustering, while a Moran’s I closer to —1 reflects dispersion. Significance can be tested by comparing it with a normal distribution. - •
*Geary’s C:*This statistic is used for analyzing small neighborhoods and for describing the dispersion of hot spots (Eck et al., 2005). Computations for Geary’s C are similar to those of variance in non-spatial statistics (Paynich and Hill, 2010) in that it “is a measure of the deviations in intensity values of each point with one another” (Eck et al., 2005, p. 19). Like Moran’s I, the Geary’s C coefficient can also be tested for significance (Eck et al., 2005). Results indicate positive or negative spatial autocorrelation (Eck et al., 2005).

## Global Statistical Tests

*A* number of simple-to-use global statistical tests can be used to help analysts understand general patterns in the crime data. These have to do with spatial statistics. Unlike traditional statistics, spatial statistics use distance, space, and spatial relationships as part of the math for their computations. These statistics serve as spatial distribution and pattern analysis tools.They are used by analysts to answer such questions as “Where is the center?” and “How are features distributed around the center?”

Why use spatial statistics?

They help analysts assess patterns, trends, and relationships. In addition, they can lead to a better understanding of geographic phenomena while assisting in pinpointing causes of specific geographic patterns.

These statistical tests include the following:

*Mean center:* The mean center is a point constructed from the average *x *and *y* values for the input feature centroids.The mean center point can be used as a relative measure to compare spatial distributions between different crime types or against the same crime type for different periods of time (Eck et al., 2005). A crime analyst might want to see if the mean center for burglaries shifts when evaluating daytime versus night-time incidents. This information could be used to make recommendations for reallocating resources.

- •
*Standard deviation distance:*Standard deviation measures the distance of features around the mean. Measures of standard deviation distance help to explain the level and alignment of dispersion in the crime data. By comparing crimes within one standard deviation, the analyst can determine which crimes are least dispersed and which are most dispersed. - •
*Standard deviation ellipse:*The standard deviation ellipse is a way of presenting the information found by using the measures of standard deviation.The size and shape of the ellipse help to explain and illustrate the degree of dispersion of different crimes.

Another group of spatial association statistics is the local indicators of spatial association (LISA) statistics. Two statistics from this group are Gi and Gi*, which perform computations on a grid cell output, such as those of a density map, or at least aggregate data. You may recall that the output of a density map is a grid of raster cells (a set of cells arranged in rows and columns, and a commonly used data set in GIS) .These tests examine each cell in the grid and assume initially that the values within that cell and its surrounding neighbors are similar to the values anywhere else on the grid; that is, they are not unusually different than would be expected from random chance (Chainey and Ratcliffe, 2005). On the other hand, if local spatial autocorrelation exists as well as clustering, we would see spatial clustering of high values with high values and low values with low values (Chainey and Ratcliffe, 2005).

One parameter that analysts need to calibrate when using LISA statistics is determining the distance from the target cell and its neighbors (Eck et al., 2005); in other words, how far it is and what is considered a “neighbor”? In sum, LISA statistics reflect the idea that a single block may have a high crime count, but it is not considered a hot spot unless there are other nearby blocks that also have high crime counts (i.e., there is a significant positive spatial correlation) (Gorr and Kurland, 2012).

Analysts can use statistical software to determine whether an area with a high number of crimes is a hot spot or whether the clustering of those crimes is a random occurrence. CrimeStat III and GeoDa are two computer software programs for hot spot analysis.

1 *CrimeStat 111:* CrimeStat is a spatial statistical program used to analyze the locations of crime incidents and identify hot spots.

2 *GeoDa:* The GeoDa Center for Geospatial Analysis and Computation at the University of Chicago develops state-of-the-art methods for geospatial analysis, geovisualization, geosimulation, and spatial process modeling, and implements them through software tools.