 # BIAS, CONSISTENCY, AND EFFICIENCY OF ESTIMATORS

In order to facilitate debates about which estimators are the best, statisticians developed several objective criteria that can be used to compare estimators. For example, it is commonly taught that the ML estimator of the standard deviation for the Gaussian is biased. This means that for a sample of data, the value of the standard deviation obtained using the formula given will tend to (in this case) underestimate the "true" standard deviation of the values in the pool. On the other hand, the estimator is consistent, meaning that, as the sample size drawn from the pool approaches infinity, the estimator does converge to the "true" value. Finally, the efficiency of the estimator describes how quickly the estimate approaches the truth as a function of the sample size. In modern molecular biology, these issues will almost always be taken care of by the computer statistics package used to do the calculations. Thus, although this is traditionally covered at length in introductory statistics courses, it's something we rarely have to worry about in computer-assisted science.

So how do we choose an objective function? In practice, as biologists we usually choose the one that’s simplest to apply, where we can find a way to reliably optimize it. We’re not usually interested in debating about whether the likelihood of the model is more important than the likelihood of the data. Instead, we want to know something about the parameters that are being estimated—testing our hypothesis about whether this cell line yields greater protein abundance than another cell line, whether a sample is a breast tumor and not healthy tissue, whether there are two groups of interbreeding populations or three, or how well mRNA levels predict protein abundance. So as long as we use the same method of estimation on all of our data, it’s probably not that important which estimation method we use. 