EXERCISES
- 1. I said that the AUC of random guessing was 0.5. What is the AUC of the “sleeping sniffer dog” in the case where the data are truly 1% positives and 99% negatives? Why isn’t it 0.5?
- 2. What does the P-R plot look like for random guessing if there are equal numbers of positives and negatives in the data?
- 3. Show that LDA with variable cutoffs corresponds to using the MAP classification rule with different assumptions about the prior probabilities.
- 4. In cases of limited data, one might expect leave-one-out crossvalidation to produce better classification performance than, say, sixfold cross-validation. Why?
5. I have an idea: Instead of training the parameters of a naive Bayes classifier by choosing the ML parameters for the Gaussian in each class, I will use the leave-one-out cross-validation AUC as my objective function and choose the parameters that maximize it. Am I
cool?
REFERENCES AND FURTHER READING
Altschul S, Gish W, Miller W, Myers E, Lipman D. (1990). Basic local alignment search tool. J. Mol. Biol. 215(3):403-410.
Shimodaira, H. (2000). Improving predictive inference under covariate shift by weighting the log-likelihood function. J. Stat. Plan. Infer. 90(2):227-244. Yuan Y, Guo L, Shen L, Liu JS. (2007). Predicting gene expression from sequence: A reexamination. PLoS Comput. Biol. 3(11):e243.