TRAINING NAIVE BAYES CLASSIFIERS
As with any classifier, during the training stage, we need to do estimation of the parameters for Naive Bayes. For models for each class (i.e., Gaussians, Discrete, etc.) this is straightforward: simply take all of the known examples for each class and do ML estimation. Since naive Bayes assumes all the dimensions are independent, this is always a simple univariate estimation problem for the known positives and negatives in each dimension. In practice, for categorical distributions, there are some subtleties here related to avoiding ML estimates of zero, but these are discussed elsewhere (e.g., Henikoff and Henikoff 1996).
We also need to estimate the prior parameters, which can turn out to be important. The ML estimate for the prior parameter of the positive class turns out to be simply the fraction of the training set in the positive class. However, using this estimate for classification of new observations means we are assuming that the fraction of positives in the new examples is the same as what it was in the training set. In practice, in molecular biology applications, this is not true. For example, let’s say we were using a naive Bayes classifier to recognize a certain feature in the genome, such as a DNA sequence motif. Ideally, we would train the classifier based on a set of positive examples and a set of examples that we know are not the DNA motif. In practice, we typically have a small number of positive examples, and no examples of sequences we know are not the motif. However, when we search the genome, we expect very few of the positions we test to be new examples of motifs. Even in the ideal case, the prior estimated based on the training data won’t reflect our expectations for the situation when we make predictions. This strongly suggests that we should set the prior for classification of new examples to be much smaller that the fraction of positives in our training set.