# CHOOSING THE NUMBER OF CLUSTERS USING THE AIC

To choose the “best” clustering and the optimal number of clusters for the Gaussian Mixture, we could go ahead and use a measure like the silhouette that we introduced in the previous chapter. However, one attractive feature of the Gaussian mixture model is that the objective function is the likelihood. At first, we might guess that we could choose the number of clusters that maximizes the likelihood. However, we know that the likelihood will always increase as the number of clusters increases, because adding additional clusters increases the number of parameters (see the previous section). So we can’t choose the number of clusters by maximum likelihood. Fortunately, under certain assumptions, it’s possible to predict how much the fit of a model will improve if the extra parameters were only fitting to the noise. This means we can trade off the number of parameters in the model (the “complexity of the model”) against the likelihood (or “fit of the model”). For example, for a model with *k* parameters and maximum likelihood *L,* the AIC (Akaike Information Criterion) is defined as

Information-theoretic considerations suggest that we choose the model that has the minimum AIC. So if a more complex model has one additional parameter, it should have a log-likelihood ratio of at least one (compared to the simpler model) before we accept it over a simpler model. Thus, the AIC says that each parameter translates to about 1 units of log- likelihood ratio. The AIC assumes the sample size is very large, so in practice, slightly more complicated formulas with finite sample corrections are often applied. To use these, you simply try all the models and calculate the (finite sample corrected) AIC for all of them. You then choose the one that has the smallest (Figure 6.5).

FIGURE 6.5 Choosing models (cluster number) using the AIC. For the CD4- CD8 data shown here, the AIC reaches a minimum with 5 clusters for both the Gaussian mixture model with full covariance (left) with diagonal covariance (right). This agrees with the observation that the CD4+CD8+ cells were not captured by the models with only four clusters.