# POINT ESTIMATES, CONFIDENCE INTERVALS, AND P VALUES

We use confidence intervals because a point estimate, being a single value, cannot express the statistical variation, or random error, that underlies the estimate.

If a study is large, the estimation process can be comparatively precise, and there may be little random error in the estimation. A small study, however, has less precision, which means that the estimate is subject to more random error. A confidence interval indicates the amount of random error in the estimate. A given confidence interval is tied to an arbitrarily set level of confidence. Commonly, the level of confidence is set at 95% or 90%, although any level in the interval of 0% to 100% is possible. The confidence interval is defined statistically as follows: If the level of confidence is set to 95%, it means that if the data collection and analysis could be replicated many times and the study were free of bias, the confidence interval would include within it the correct value of the measure 95% of the time. This definition presumes that the only thing that would differ in these hypothetical replications of the study would be the statistical, or chance, element in the data. It also presumes that the variability in the data can be described adequately by a statistical model and that biases such as confounding are nonexistent or completely controlled. These unrealistic conditions are typically not met even in carefully designed and conducted randomized trials. In nonexperimental epidemiologic studies, the formal definition of a confidence interval is a fiction that at best provides a rough estimate of the statistical variability in a set of data. It is better not to consider a confidence interval to be a literal measure of statistical variability but rather a general guide to the amount of random error in the data.

The confidence interval is calculated from the same equations that are used to generate another commonly reported statistical measure, the *P value,* which is the statistic used for statistical hypothesis testing. The *P* value is calculated in relation to a specific hypothesis, usually the *null hypothesis,* which states that there is no relation between exposure and disease. For the *RR* measure, the null hypothesis is *RR =* 1.0. The *P* value represents the probability, assuming that the null hypothesis is true and the study is free of bias, that the data obtained in the study would demonstrate an association as far from the null hypothesis or farther than what was actually obtained. For example, suppose that a case-control study gives, as an estimate of the relative risk, *RR =* 2.5. The *P* value answers this question: What is the probability, if the true RR = 1.0, that a given study may give a result as far as this or farther from 1.0? The *P* value is the probability, conditional on the null hypothesis, of observing as strong an association as was observed or a stronger one.

*P* values can be calculated using statistical models that correspond to the type of data that have been collected (see Chapter 9). In practice, the variability of collected data is unlikely to conform precisely to any given statistical model. For example, most statistical models assume that the observations are independent of one another. Many epidemiologic studies, however, are based on observations that are not independent. Data also may be influenced by systematic errors that increase variation beyond that expected from a simple statistical model. Because the theoretical requirements are seldom met, a *P* value usually cannot be taken as a meaningful probability value. Instead, it can be viewed as something less technical: a measure of relative consistency between the null hypothesis and the data in hand. A large *P* value indicates that the data are highly consistent with the null hypothesis, and a low *P* value indicates that the data are not very consistent with the null hypothesis. More specifically, if a *P* value were as small as .01, it would mean that the data were not very consistent with the null hypothesis, but a *P* value as large as .5 would indicate that the data were reasonably consistent with the null hypothesis. Neither of these *P* values should be interpreted as a strict probability. Neither tells us whether the null hypothesis is correct or not. The ultimate judgment about the correctness of the null hypothesis will depend on the existence of other data and the relative plausibility of the null hypothesis and its alternatives.

What is the Probability That the Null Hypothesis Is Correct?

Some people interpret a *P* value as a probability statement about the correctness of the null hypothesis, but that interpretation cannot be defended. First, the null hypothesis, like any hypothesis, should be regarded as true or false but not as having a probability of being true. A probability would not be assigned to the truth of any hypothesis except in a subjective sense, as in describing betting odds. Even in framing a subjective interpretation or in assigning betting odds, the *P* value should not be considered to be equivalent to the probability that the null hypothesis is correct.

It is true that the *P* value is a probability measure. When the data are very discrepant with the null hypothesis, the *P* value is small, and when the data are concordant with the null hypothesis, the *P* value is large. Nonetheless, the *P* value is not the probability that the null hypothesis is correct. It is calculated only after assuming that the null hypothesis is correct, and it refers to the probability that the association observed in the data, divided by its standard error, would deviate from the null value as much as it did or more. It can thus be viewed as a measure of consistency between the data and the null hypothesis, but it does not address whether the null hypothesis is correct. Suppose you buy a ticket for a lottery. Under the null hypothesis that the drawing is random, your chance of winning is slim. If you win, the *P* value evaluating the null hypothesis (that you won by chance) is tiny, because your winning is not a likely outcome in a random lottery with many tickets sold. Nevertheless, someone must win. If you did win, does that constitute evidence that the lottery was not random? Should you reject the null hypothesis because you calculated a very low *P* value? The answer is that even with a very low *P* value, the tenability of the null hypothesis depends on what alternative theories you have. One woman who twice won the New Jersey state lottery said she would stop buying lottery tickets to be fair to others. The more reasonable interpretation is that her two wins were chance events. The point is that the null hypothesis may be the most reasonable hypothesis for the data even if the *P* value is low. Similarly, the null hypothesis may be implausible or just incorrect even if the *P* value is high.