# PERMUTATION TESTS

What if a situation arises where you have a hypothesis that isn’t summarized by any traditional hypothesis test? For example, say you wanted to test if the mean is bigger *and* the standard deviation is smaller. You might want to use the test statistic *a(X) - sd(X)* (where I used a() to denote the mean and *sd()* for the standard deviation). Or maybe you think (for some reason) that the ratio of the maximum to the average is what’s important in some conditions. You might use the test statistic max(X)/a(X). You might try to figure out the distribution of your test statistic under the null hypothesis, assuming your data were Gaussian.

However, another way to do it is to use your data to compute the null distribution using a permutation test. These tests are particularly powerful when you are comparing two samples (such as T cells vs. other cells, or predicted “positives” vs. “negatives”). In all of these cases, the null hypothesis is that the two samples are actually *not* different, i.e., they are drawn from the same distribution. This means that it’s possible to construct an estimate of that distribution (the null distribution) by putting the observations from the two sets into one pool, and then drawing randomly the observations in the two samples. In the case of the CD4 measurements in

T cells versus other cells, we would mix all the T-cell measurements with the other cell measurements, and then randomly draw a sample of the same size as the T-cell sample from the combined set of measurements. Using this random fake T-cell sample, we can go ahead and compute the test statistic, just as we did on the real T-cell data. If we do this over and over again, we will obtain many values for the test statistic (Figure 2.6, left panel). The distribution of the test statistic in these random samples is exactly the null distribution to compare our test statistic to: These are the values of the test statistic that we *would* have found if our T-cell sample was truly randomly drawn from the entire set of measurements. If we sample enough times, eventually we will randomly pull the exact set of values that are the real T-cell data and get the exact value of the test statistic we observed. We will also, on occasion, observe test statistics that exceed the value we observed in the real data. Our estimate of the P-value is nothing more than the fraction of random samples where this happens (Figure 2.6, right panel). This approach is sometimes referred to as “permuting the labels” because in some sense we are forgetting the “label” (T-cell or not) when we create these random samples.

The great power of permutation tests is that you can get the null distribution for any test statistic (or any function of the data) that you make up. Permutation tests also have several important drawbacks: Permutation tests involve large numbers of random samples from the data—this is only

FIGURE 2.6 Illustration of how to obtain a P-value from permutation test. It indicates a test statistic, which is just a function of a sample of numbers from the pool. By choosing random samples from the pool according to the null hypothesis and computing the test statistic each time, you can calculate the null distribution of the test statistic. To calculate the *P*-value, you sum up the area under the null distribution that is more extreme than the value you observed for your real data.

possible with a computer, and if the number of samples needed is large or complicated, it could take a very long time to get there. Probably, more of an issue in practice is that for an arbitrary test statistic, the permutation test needs to be “set up” by using some kind of programmable statistics package. Finally, in order to implement a permutation test, you have to really understand your null hypothesis. This can become especially tricky if your data are correlated or have complicated heterogeneity or structure.

## KEY STATISTICAL TESTS FOR COMPARING TWO LISTS OF NUMBERS

- •
*t-Test*: Finds differences in means between two lists of numbers (think mutant vs. wt) when data are approximately Gaussian. Works for small sample sizes. - •
*Wilcoxon Rank-Sum/WMW test**:*Nonparametric version of the t-test. Works on any data distribution, but less power than a t-test, especially for small sample sizes. - •
*KS-test**:*Nonparametric test for differences in distributions of two lists of numbers. Again, less power for small sample sizes. Most powerful when there is a difference in means, but also pretty good in general. - •
*Fisher's exact test**:*Tests for differences in ratios or fractions (data in a 2 x 2 table). Works on any sample size. Can be used on continuous data if a cutoff is chosen to define classes. - •
*Permutation test**:*Roll your own test statistic and estimate the null distribution by resampling your data. Since it's custom, you usually have to write your own codes.