# Hypothesis Testing, Confidence Intervals, and .p.-Values

Hypothesis testing is the process of carrying out statistical assessments of a sample and using the results to make inferences about population parameters. The true values of population parameters are often unknown and may be assessed either directly using point estimation techniques, or indirectly through hypothesis testing. The latter is the most common approach in inferential statistics, and it embodies the classic tradition of deductive reasoning or the *a priori* approach described in Chapter 2. Hypothesis testing begins by formulating a hypothetical statement or proposition about the true value of the population parameter. The proposed statement could be based on prior information about the population parameter generated from previous studies, observations from data exploration using visualization tools, results from a pilot project, or purely based on theoretical grounds. The analysis then proceeds with the statistical evaluation of the sample data for use in validating or denying the proposed statement. Three things are important when performing hypothesis testing: (1) the formulation of a hypothesis set consisting of both null and alternative hypotheses, (2) the decision regarding the test criteria and the level of statistical significance, and (3) the choice of the appropriate statistical test to evaluate the formulated hypothesis.

The hypothesis set consists of two competing claims that are made about the true value of the population parameter. The first claim, the null hypothesis designated as H_{0}, describes the hypothetical state of affairs. This null is the statement under statistical investigation; as the name implies, it is a negation and is often contrary to the research hypothesis or the opposite of what a data scientist believes to be true. The alternative hypothesis is a statement of the research hypothesis, or a conjecture of what a data scientist hopes to establish as true based on the empirical observations drawn from the sample. This alternative hypothesis is designated as H_{A}, and following the statistical analysis, it will be accepted as the true statement when the null is rejected. Both hypotheses must be formulated in such a way that they are mutually exclusive of each other but collectively exhaustive of all of the possible values of the true estimate of the population parameter.

**TASK 4.3 HYPOTHESIS TESTING USING STUDENT'S T-STATISTICS**

The student's f-statistic can be used to test one sample means, or test the difference in means obtained from two samples. Examples of research projects that require the test of two sample means include (1) comparison of physical or cultural characteristics of two regions, or two spatial units; (2) evaluation of the effectiveness of a new drug among the treatment (experimental study group) versus a control (placebo group); (3) before and after studies such as examining the effectiveness of a weight loss prevention program; (4) population health disparities between minority and nonminority groups; and (5) health impacts of anthropogenic versus natural hazards.

The test of two sample means may be based on independent samples or paired samples. Independent samples *t* allow for the comparison of means drawn from two samples in which the selection of the observations from the first sample has no bearing on the observations selected in the second sample. The samples are completely independent and unrelated to each other. For example, in Chicago, one could choose to compare the prevalence of lead poisoning among minority children and nonminority children. For a paired samples f-test, the two samples may be related, say from sets of twins, married couples, or having measurements taken repeatedly (but under different scenarios) from the same observations to generate the data. For example, one could decide to examine water quality in the Susquehanna River before and after Hurricane Sandy. The same monitoring stations or sample points will be used to generate the data at different times.

In the current task, let us explore the application of the independent samples f-test. We will use the obesity data generated for two states: New York and Mississippi. We will rely on the descriptive statistics reported earlier in Table 4.1.

Hypothesis testing also requires a data scientist to predetermine the level of statistical significance at which to evaluate the null hypothesis. To do so, it is vital to have some knowledge of the probability distribution that measures the likelihood of obtaining a certain value out of all possible outcomes. The significance level (denoted as *a)* represents a fixed probability of wrongly rejecting the null hypothesis when it is true. The most commonly selected probabilities are 0.01 or 0.05, respectively, signifying a 1% or 5% chance of making the inferential (or type I) error. The other kind of error (type II) occurs when we do not reject a null hypothesis that is false (denoted by 1-beta). Another relevant piece of information required to evaluate the hypothesis is deciding on the tails of the probability distribution, and whether one is working with a one-tailed or two-tailed test. Invariably, this depends on the overall objectives of the research and the formulation of the hypothesis sets. If a data scientist has a sense of the specific direction in which the true value of the parameter is likely to fall, a directed or pointed hypothesis set will be formulated. Such a hypothesis set will call for a one-tailed significance test that uses either the upper or lower tails of the probability distribution. A nondirectional hypothesis set in which the population estimate is likely to fall within the lower and upper tails of the probability distribution will call for a two-tailed significance test.

The third and perhaps the most critical decision to make in hypothesis testing is choosing the appropriate test to analyze the data. Several factors come into play here, including the nature of the research question, the sample size, the measurement scale of the variables, and whether or not the data conform to the key assumptions of the statistical test. There are several statistical techniques for testing all population parameters. Let us work through, but most examples are drawn from tests of sample means, proportions, and tests of associations. Examples include the use of student's f-tests and * ^{2}* test statistics. Let us work through a few examples to illustrate the application.

*Step 1: Formulating the hypothesis set*

*H _{Q}:* There are no statistical differences in mean obesity rates observed between New York and Mississippi. The observed means of the two states are not significantly different:

*H _{A}:* There are statistically significant differences in mean obesity rates observed between New York and Mississippi. The observed means of the two states are significantly different:

*Step 2: Establishing the level of significance*

The hypothesis set formulated earlier is nondirectional and, therefore, calls for a two-sided significance test that will enable us to work with both tails of the probability distribution. We will conduct the test based on a fixed probability of 0.05.

*Step* 3: *Applying the appropriate test*

The test of an independent sample *t* is based on four assumptions: (1) The criterion variable should be measured on an interval/ratio scale. In the previous example, the criterion variable is percent obesity, measured on the ratio scale. (2) Data values drawn from the two groups are independent from each other. In the previous example, the data from New York are statistically independent from Mississippi. (3) The samples (location 1 and 2) must be drawn from normally distributed populations. This is the normality assumption and can be validated during the data screening procedures. (4) The two sampled populations must have similar/equal variances. This is the homogeneity of variance test and can also be validated during data screening using Levene's test of equal variance.

Assuming that the samples are approximately normal with equal variance, let us use the following equation to compute the f-test:

where

3q is the sample mean for location 1, and *x _{2}* is the sample mean for location 2

s^{2} is the sample variance for location 1, and *s _{2}* is the standard deviation for location 2

The degrees of freedom (df) for this is и, *+ n _{2} -* 2. For the previous example, the df is 142.

As this is a two-tailed significance test, the critical value would be determined by dividing the alpha value (a) of 0.05 by two. Therefore, the critical *t* value obtained from a *t* distribution table at 0.025 with 142 degrees of freedom is 1.977. So, we will reject the null hypothesis if the observed *t* is less than the critical *t* of -1.977, or greater than the critical f of +1.9766.