# Chapterv 11 Nonlinear Classification

**I**n the previous chapter, we considered methods that draw a (high-dimensional) line through the feature space to separate classes. However, it’s clear that for some datasets, there simply will be no line that does a very good job. In practice, it doesn’t take much to get datasets where linear classification methods will not perform very well. However, once we accept that we aren’t going to be able to use a line (or hyperplane) in the feature space to divide the classes, there are a lot of possible curves we could choose. Furthermore, there are only a few probabilistic models that will yield useful nonlinear classification boundaries, and these tend to have a lot of parameters, so they are not easy to apply in practice.

QDA: NONLINEAR GENERATIVE CLASSIFICATION

Quadratic discriminant analysis (or QDA) is the generalization of linear discriminant analysis (or LDA) to the case where the classes are described by multivariate Gaussian distributions, but they are not assumed to have equal covariances. QDA does not yield a linear classification boundary, and in principle would perform optimally in the case where the data truly were drawn from multivariate Gaussian distributions. In practice, this is rarely (never?) the case in molecular biology. Perhaps more problematic is that now we need to estimate covariance matrices for each class. As long as the number of true positive training examples is small (which is typically true for problems with rare positives), estimation of the covariance will be unreliable. Nevertheless, QDA is the lone well-studied example of a nonlinear generative classification model that's based on a standard

FIGURE 11.1 The decision boundary for QDA on the T-cell classification using CD8 gene expression is indicated by a solid line. The means for the two classes are indicated by black unfilled circles, and the datapoints are indicated as “+” for T-cells (positives) and “-” for other cells (negatives). In this case, the data doesn’t fit the Gaussian model very well, so even though the classification boundary is nonlinear, it doesn’t actually do any better than a linear boundary (it actually does a bit worse).

probability model (at least it's the only one I know about). The decision boundary turns out to be an ellipse (or quadratic form) in the feature space (Figure 11.1).