BAYESIAN ESTIMATION AND PRIOR DISTRIBUTIONS
As we have seen, the MAP objective function and more generally penalized likelihood methods can be related to the ML objective function through the use of Bayes' theorem. For this reason, these methods sometimes are given names with the word "Bayesian" in them. However, as long as a method results in a single estimator for parameters, it is not really Bayesian in spirit. Truly Bayesian estimation means that you don't try to pin down a single value for your parameters. Instead, you embrace the fundamental uncertainty that any particular estimate for your parameters is just one possible estimate drawn from a pool. True Bayesian statistics mean that you consider the entire distribution of your parameters, given your data and your prior beliefs about what the parameters should be. In practice, Bayesian estimation is not usually used in biology, because biologists want to know the values of their parameters. We don't usually want to consider the whole distribution of expression levels of our gene that are compatible with the observed data: we want to know the level of the gene.
One interesting example where the Bayesian perspective of estimating probability distributions matches our biological intuition and has led to a widely used data analysis approach is in the problem of population structure inference. Population structure is the situation where the individuals from a species are separated into genetic groups such that recombination is much more common within the genetically separated groups than between them. Population structure is often caused by geographical constraints: western, central, and eastern chimpanzees are separated from each other by rivers. The individuals in each population rarely cross, and therefore there is little "gene flow" between populations. Understanding population structure is important for both evolutionary genetics and human medical genetics. For example, when a case-control study is performed to identify loci associated with a disease, if the population structure of the cases is different than that of the controls, spurious associations will be identified. In general, for a sample of humans, population history is not always known, but population structure can be inferred based on differences in allele frequencies between populations, and individuals can be assigned to populations based on their genotypes. In the simplest case, where there is no recombination at all between subpopulations, individuals should be assigned to exactly one subpopulation. However, when rare recombination events do occur, so- called "admixed" individuals may appear in a population sample. Admixture simply means that individuals have parents (or other more distant ancestors) from more than one of the subpopulations. Since the assortment of genetic material into offspring is random, these admixed individuals will truly be drawn from more than one subpopulation. In this case, the Bayesian perspective says, rather than trying to assign each individual to a population, estimate the probability distribution over the ancestral subpopulations. This Bayesian approach is implemented in the widely used software STRUCTURE
(Pritchard et al. 2000) and allows an individual to be partially assigned to several subpopulations.
Although Bayesian estimation is not used that often for molecular biology data analysis, the closely associated concepts of prior and posterior distributions are very powerful and widely used. Because the Bayesian perspective is to think of the parameters as random variables that need to be estimated, models are used to describe the distributions of the parameters both before (prior) and after considering the observations (posterior). Although it might not seem intuitive to think that the parameters have a distribution before we consider the data, in fact it makes a lot of sense: we might require the parameters to be between zero and infinity if we are using a Poisson model for numbers of sequence reads or ensure that they add up to one if we are using a multinomial or binomial model for allele frequencies in a population. The idea of prior distributions is that we can generalize this to quantitatively weigh the values of the parameters by how likely they might turn out to be. Of course, if we don't have any prior beliefs about the parameters, we can always use uniform distributions, so that all possible values of the parameters are equally likely (in Bayesian jargon, these are called uninformative priors). However, as we shall see, we will find it too convenient to resist putting biological knowledge into our models using priors.