 Finite Gaussian Mixture Models

Here, we provide a detailed example to illustrate the concept of Bayesian modeling and posterior computation. We start with a finite Gaussian mixture model. Begin by representing data in a general form, with the following notations and definitions: Let x denote the set of flow cytometry measurements from one sample, where x is a matrix of size n x p, n is the total number of cells measured in one sample, and p is the number of markers measured in the assay. Then хг = (xa,..., xip) is the p-dimensional vector representing the measured markers on the ith cell. We typically compensate, transform, and standardize the flow cytometry measurements before performing any statistical analysis; hence, x usually represents the preprocessed flow cytometry data.

The finite Gaussian mixture model with k components may be written as follows: where k is the number of mixture components, пг,nk are the mixture probabilities that sum to 1, N is the multivariate normal distribution, gj is the p-dimensional mean vector of mixture components j, and hj is the corresponding p x p covariance matrix. The mixture model can be interpreted as arising from a clustering procedure depending on an underlying latent indicator zi for xi. That is, zi = s indicates that x; was generated from mixture component s, or x^Zj = s ~ N (gs, hs), and with P (zi = s) = ns. Typically, a

number of models with different k are run and the number of mixture components is determined by model selection criterion such as AIC, BIC, and other informational approximations to likelihoods.

One way to understand the Gaussian mixture model, which represents how the Bayesian model is coded and fitted to data in practice, is as a generative model: where each xi is sampled according to a k-component Gaussian mixture model; {nj, gj, hj} is the set of parameters associated with the jth mixture component; (n1, ..., nk) is the mixing proportions nj > 0, for j = 1, ..., k, and

= nj = 1. The probability nj also represents the prior probability that an

event comes from each component j. The conjugate prior for (n1, ..., nk) is the symmetric Dirichlet distribution Dir(a/k, ..., a/k), where a is a positive constant that may be fixed or have its own prior distribution. The mean vector gj and covariance matrix hj are jointly distributed according to a normal- inverse-Wishart distribution; m, X, q, Ф are hyperparameters that may be fixed or have their own prior distributions. As discussed below, this basic probabilistic framework can be extended to incorporate additional structural information.