 # Comparing Tests

For fixed level, alternative, and power, the test with a smaller sample size is better. Consider two families of one-sided tests, indexed by sample size, using statistics T and T2, both with test level a, and determine the sample sizes required to give power 1 — ,5, for the same alternative. Compare the tests by taking the ratio of these two sample sizes. The ratio is called relative efficiency; the notation dates back at least as far as Noether (1950), citing Pitman (1948).

Let tj n represent the critical value for test j based on n observations; that is, the test based on statistic Tj and using n observations, rejects the null hypothesis if Tj > tj n. Hence n satisfies Pe« [Tj > f°n] = a. Let zujM(0A) represent the power for test Tj using n observations, under the alternative 0A: Assume that Two tests, tests 1 and 2, involving hypotheses about a parameter 0, taking the value under the null hypothesis, and with a simple alternative hypothesis of form {0A. for some 0A > , with similar level and power, will be compared. Pick a test level a and a power 1 — /3, and the sample size n for test 1. The power and level conditions on T imply a value for 0A under the alternative hypothesis; that is, 0A solves Pqa [Tj > t° n ] =1-/3. Note that 0A is a function of щ, a, and ,5. Under conditions (2.14), one can determine the minimal value of so that test 2 has power at least 1 — /3, under the alternative given by вА. Report ni/«2 as the relative efficiency of test 2 to test 1; this depends on щ, a, and 0.

Define the asymptotic relative efficiency ARE,,# [Ti, T-i] as when this limit exists. Considering this quantity removes dependence on nj.

This measure comparing efficiencies of two tests takes on a particularly easy form in a special, yet common, case, in which both statistics are asymptotically Gaussian. In this case, the relative efficiency can be approximated in terms of standard deviations and derivatives of means under alternative hypotheses. General approximations for sample size, power, and effect sizes are investigated first; these are applied to relative efficiency later.

## Power, Sample Size, and Effect Size

This subsection presents formulas for power, sample size, and effect size, that may be used for efficiency comparisons, but are also useful on their own. Gaussian approximations earlier in this chapter often applied a continuity correction; this correction will not be applied for large-sample power and sample size calculations, as the effect of this correction quickly becomes negligible as the sample size increases. Without loss of generality, take в0 = 0.

### Power

Consider test statistics satisfying The Gaussian distribution in (2.15) does not need to hold exactly; holding approximately is sufficient. In this case, one can find the critical values for the two tests, n,, such that Po Tj > tjn = a. Since (Tj — pj(0))/<^(0) is approximately standard Gaussian under the null hypothesis, then Hence The power for test j is approximately Often the variance of the test statistic changes slowly as one moves away from the null hypothesis; in this case, the power for test j is approximately ### Sample and Effect Sizes

When the test statistic variance decreases in a regular way with sample size, one can invert the power relationship to determine the sample size needed for a given power and effect size. Consider tests satisfying, in addition to (2.15), Then As sample sizes increase, power increases for a fixed alternative, and calculations will consider a class of alternatives moving towards the null. Calculations below will consider behavior of the expectation of the test statistic near the null hypothesis (which is taken as в = 0). Suppose that (These conditions are somewhat simpler than considered by Noether (1950); in particular, note that (2.15), (2.19), and (2.21) together are not enough to demonstrate the second condition of (2.14).) Without loss of generality, continue to take = 0. In this case, critical values for the two tests are given in (2.16). The power expression (2.17) may be simplified by approximating the variances at the alternative hypothesis by quantities at the null. For large rij, alternatives with power less than 1 will have alternative hypotheses near the null, and so A) к, {)). Hence This expression for approximate power may be solved for sample size, by noting that if then 'OJj.nj (f?"4) = \$(2,3), and (2.22) holds if ^/nj — za = z@, or Common values for a and /3 are 0.025 and 0.2, giving upper Gaussian quantiles of za = 1.96 and zp = 0.84. Recall that л with a subscript strictly between 0 and 1 indicates that value for which a standard Gaussian random variable has that probability above it.

It may be of use in practice, and will be essential in the efficiency calculations below, to approximate which member of the alternative hypothesis corresponds with a test of a given power, with sample size held fixed. Solving (2.24) exactly for 9A is difficult, since the function //. is generally non-linear. Approximating this function using a one-term Taylor approximation, (Contrast this with approximation of the alternat ive standard deviation by the null standard deviation, as in the transition from (2.20) to (2.22). Approximation by the leading term alone cannot be applied to the expectation Pj(0), since it would remove all of the effect of the difference between null and alternative.) The power for test j is approximately for The quantity ej is called the efficacy of test j. Setting this power to 1 — /3, za — ynjejfb' = zi—fi. Solving this equation for 9A, verifying the requirement that 0A get close to zero. This expression can be used to approximate an effect size needed to obtain a certain power with a certain sample size and test level, and will be used in the context of asymptotic relative efficiency.

## Effciency Calculations

Equating the alternative hypothesis parameter values (2.25) corresponding to power 1 - /3, then (za - Zi^p)/y/rv[ei = (za - zi-\$)/v/nje2, or Note that this relative efficiency doesn’t depend on a or /3, or on n. As an example, suppose Xi,..., Xn are independent observations from a symmetric distribution with finite variance p2 and mean в. Then в is also the median of these observations. Compare tests T), the f-test, and Т2, the sign test. Then T has a distribution depending on the distribution of Xj, and T2 has a binomial distribution. Note that T has approximately a standard Gaussian distribution for large n. That is, T ~ Ф(в/р, 1/rii), and On the other hand, T2 ~ <5>(/л2(0),2(d)2/n2) for Hence /i2(0) = /(0), and er2(0) = 1/2, and e2 = 2/(0).

TABLE 2.4: Empirical powers for one-sample location tests with sample size ratios indicated by asymptotic relative efficiency

 Larger sample size t test sign test 20 0.5623 0.2241 100 0.5647 0.3897 1000 0.5594 0.5040 10000 0.5621 0.5438

The asymptotic relative efficiencies of these statistics depends on the distribution that generates the data. If data come from €>((). p2), then /4(0) = l/(/27rp), = 1/2, and в2 = х/2рк/р. Then щ/п2 ~ (2//27г)2 = 2/я Hence, as expected, the f-test is more powerful; the sign test requires more than 50% more observations to obtain the same power against the same alternative, for large samples.

If the data come from a Laplace distribution, then p = 1, since the Laplace distribution has variance 1. Substituting into (2.26), /4(0) = 1, o(0) = 1, and ej = 1. Also /4(0) = l//2, 02(0) = 1/2, and e2 = /2. Hence щ/п^ « (V^)2 = 2; in this case, the sign test is more powerful, requiring roughly half the sample size as does the /-test.

Table 2.4 contains results of a simulation to check actual powers that the asymptotic relative efficiency calculations show should be approximately the same. The table shows powers of the level 0.05 two-sided t and exact sign tests for Laplace data sets, of size n and 712 = n/2 respectively, shifted to have expectation З/щ. Data sets need to be quite large in order for sample sizes in the ratio of the asymptotic relative efficiency to give equal power.

Now suppose these data come from a Cauchy distribution shifted to have point of symmetry в. In this case, the expectation of the distribution does not exist, the standard deviation p is infinite, and the distribution is not approximately Gaussian even in large samples. In fact, the distribution of the the mean of Cauchy random variables is again a Cauchy random variable, with no change in the spread of the distribution. Plugging into the definition of efficacy, without worrying about regularity conditions, gives /4(0) = 1, <7] (0) = 00, and ei = 0. On the other hand, the quantities for the sign test are Hence, for Cauchy responses, the efficiency of the sign test relative to the t- test is щ/п2 ~ 00. This abuse of notation retains the interpretation that the sign test is infinitely more efficient for Cauchy observations.

Table 2.5 summarizes these calculations.

TABLE 2.5: Efficacies for one-sample location tests

 f-test Sign test Relative Gaussian m'(o) 1/p 1/(/27tp) 1 1/2 e l/p №/p V^/2 Laplace m'(0) 1 1/V2 1 1/2 e 1 V2 1/V2 Cauchy m'(0) 1 7V 1 OO 1/2 e 0 2/7Г 0

## Examples of Power Calculations

The Gaussian approximations to power (2.17), to sample size (2.24), and to effect size (2.25), may be used to assist in planning an experiment.

Example 2.4.1 In this example I calculate power for a sign test applied to 49 observations from a Gaussian distribution with unit variance. Suppose Xi,...,Xt9 ~ €>(1). 1), with null hypothesis 6 = 0 and alternative hypothesis 6 = 1/2. The sign test statistic, divided by n, approximately satisfies (2.15) and (2.19), with p and a given by (2.27). Then //г(0) = .5, eri(0) = ^0.5 x 0.5 = .5, mi(0.5) = 0.691, and power for a one-sided test of level 0.025, or a two-sided test of level 0.05, is approximated by (2.17): 1 — Ф(7 x (0.5 + 0.5 x 1.96/7 — 0.691)/0.462) = 1 —Ф(—0.772) = 0.780. The null and alternative standard deviations are close enough to motivate the use of the simpler approximation (2.22), approximating power as If, instead, a test of power 0.85 were desired for alternative expectation 1/2, with a one-sided test of level 0.025, za = 1.96, and zp = 1.036. From (2.2f.), one needs at least observations; choose 53.

Finally, one might determine how large an effect one might detect using the original )9 observations with a test of level 0.025 and power 0.85. One could use e = /2/тг = 0.797, from the box in Table 2.5 specific to the sign test and the Gaussian distribution. Expression (2.25) gives this number as (1.96 + 1.036)/(7 x 0.797) = 0.537.

# Distribution Function Estimation

Suppose one wishes to estimate a common distribution function of Xi,..., Xn independent variables. For x in the range of Xj, let F(x) be the number of data points less than or equal to x, divided by n. Since the observations are independent, F(x) ~ n~1Sin(n, F(x)). A confidence interval for F(x) is The above intervals will extend outside [0,1], which is not reasonable; this can be circumvented by transforming the probability scale.

Figure 2.4 represents the bounds from (2.28), without any rescaling, to be discussed further in the next example. Confidence bounds in Figure 2.4 exhibit occurrences of larger estimates being associated with upper confidence bounds that are smaller (ex., in Figure 2.4, the region between the second-to-largest and the largest observations), and for the region with the cumulative distribution function estimated at zero or one (that is, the region below the smallest observed value, and the region above the largest observed value), confidence limits lie on top of the estimates, indicating no uncertainty. Both of these phenomena are unrealistic. The first phenomenon, that of nonmonotonic confidence bounds, cannot be reliably avoided through rescaling; the second, with the upper confidence bounds outside the range of the data, can never be repaired through rescaling. A preferred solution is to substitute the intervals of Clopper and Pearson (1934), described in §1.2.2.1, to avoid all three of these problems (viz., bounds outside (0,1), bounds ordered differently than the estimate, and bounds with zero variability). Such intervals are exhibited in Figure 2.5.

Finally, the confidence associated with these bounds is point-wise, and not simultaneous. That is, if {L, Ui) and (L-2, U2) and are 1 —a confidence bounds associated with two ordinates xi and a;2, then P [L < F(x 1) < f/j] > 1 — a and P [L2 < F(x2) < U2] > 1 — a, at least approximately, but the preceding argument does not bound P [L < F(x 1) < U and L2 < F(x2) < U2 any higher than 1 — 2a.

Example 2.5.1 Consider the arsenic data of Example 2.3.2. For every real x, one counts the number of data points less than this x. For any x less than the smallest value 0.073, this estimate is F(x) = 0. For x greater than or equal to this smallest value and smaller than the next smallest value 0.080, the estimate is F(x) = 1/21. This data set contains one duplicate value 0.118. For values below, but close to, 0.118 (for example, x = 0.1179). F(x) = 4/21, since 21 of the observations are less than x. However, F(x) = 6/21; the jump here is twice what it is at other data values, since there are two observations here. This estimate is sketched in both Figures 2. and 2.5, and may be constructed in R using ecdf(arsenicSnails), presuming the data of Example 2.3.2 is still present to R. The command ecdf (arsenicSnails) does not produce confidence intervals; use

library(MultNonParam); ecdfcis(arsenic\$nails,exact=FALSE)

to add confidence bounds, and changing exact to TRUE forces exact intervals.

FIGURE 2.4: Empirical CDF and Confidence Bounds for Arsenic in Nails # Exercises

1. Calculate the asymptotic relative efficiency for the sign statistic relative to the one-sample t-test (which you should approximate using the one-sample г-test). Do this for observations from the

a. uniform distribution, on [—1/2,1/2] with variance 1/12 and mean under the null hypothesis of 0, and

b. logistic distribution, symmetric about 0, with variance 7t/3 and density exp(cc)/(l + exp (a:)).

FIGURE 2.5: Empirical CDF and Confidence Bounds for Arsenic in Nails and last 10 lines are data set description, and should be deleted. (Line 117 is blank, and should also be deleted).

4. Suppose 49 observations are drawn from a Cauchy distribution, displaced to have location parameter 1.

a. What is the power of the sign test at level 0.05 to test the null hypothesis of expectation zero for these observations?

b. What size sample is needed to distinguish between a null hypothesis of median 0 and an alternative hypothesis of median 1, for independent Cauchy variables with a one-sided level 0.025 sign test to give 80% power?