Comparing Tests
For fixed level, alternative, and power, the test with a smaller sample size is better. Consider two families of onesided tests, indexed by sample size, using statistics T and T2, both with test level a, and determine the sample sizes required to give power 1 — ,5, for the same alternative. Compare the tests by taking the ratio of these two sample sizes. The ratio is called relative efficiency; the notation dates back at least as far as Noether (1950), citing Pitman (1948).
Let tj _{n} represent the critical value for test j based on n observations; that is, the test based on statistic Tj and using n observations, rejects the null hypothesis if Tj > tj _{n}. Hence t°_{n} satisfies Pe« [Tj > f°_{n}] = a. Let zuj_{M}(0^{A}) represent the power for test Tj using n observations, under the alternative 0^{A}:
Assume that
Two tests, tests 1 and 2, involving hypotheses about a parameter 0, taking the value 0° under the null hypothesis, and with a simple alternative hypothesis of form {0^{A}. for some 0^{A} > 0°, with similar level and power, will be compared. Pick a test level a and a power 1 — /3, and the sample size n for test 1. The power and level conditions on T imply a value for 0^{A} under the alternative hypothesis; that is, 0^{A} solves Pqa [Tj > t° _{n} ] =1/3. Note that 0^{A} is a function of щ, a, and ,5. Under conditions (2.14), one can determine the minimal value of so that test 2 has power at least 1 — /3, under the alternative given by в^{А}. Report ni/«2 as the relative efficiency of test 2 to test 1; this depends on щ, a, and 0.
Define the asymptotic relative efficiency ARE,,# [Ti, Ti] as
when this limit exists. Considering this quantity removes dependence on nj.
This measure comparing efficiencies of two tests takes on a particularly easy form in a special, yet common, case, in which both statistics are asymptotically Gaussian. In this case, the relative efficiency can be approximated in terms of standard deviations and derivatives of means under alternative hypotheses. General approximations for sample size, power, and effect sizes are investigated first; these are applied to relative efficiency later.
Power, Sample Size, and Effect Size
This subsection presents formulas for power, sample size, and effect size, that may be used for efficiency comparisons, but are also useful on their own. Gaussian approximations earlier in this chapter often applied a continuity correction; this correction will not be applied for largesample power and sample size calculations, as the effect of this correction quickly becomes negligible as the sample size increases. Without loss of generality, take в^{0} = 0.
Power
Consider test statistics satisfying
The Gaussian distribution in (2.15) does not need to hold exactly; holding approximately is sufficient. In this case, one can find the critical values for the two tests, t° _{n},, such that Po Tj > tj_{n} = a. Since (Tj — pj(0))/<^(0) is approximately standard Gaussian under the null hypothesis, then
Hence
The power for test j is approximately
Often the variance of the test statistic changes slowly as one moves away from the null hypothesis; in this case, the power for test j is approximately
Sample and Effect Sizes
When the test statistic variance decreases in a regular way with sample size, one can invert the power relationship to determine the sample size needed for a given power and effect size. Consider tests satisfying, in addition to (2.15),
Then
As sample sizes increase, power increases for a fixed alternative, and calculations will consider a class of alternatives moving towards the null. Calculations below will consider behavior of the expectation of the test statistic near the null hypothesis (which is taken as в = 0). Suppose that
(These conditions are somewhat simpler than considered by Noether (1950); in particular, note that (2.15), (2.19), and (2.21) together are not enough to demonstrate the second condition of (2.14).) Without loss of generality, continue to take 6° = 0. In this case, critical values for the two tests are given in (2.16). The power expression (2.17) may be simplified by approximating the variances at the alternative hypothesis by quantities at the null. For large rij, alternatives with power less than 1 will have alternative hypotheses near the null, and so
This expression for approximate power may be solved for sample size, by noting that if
then 'OJj._{nj} (f?"^{4}) = $(2,3), and (2.22) holds if ^/nj — z_{a} = z@, or
Common values for a and /3 are 0.025 and 0.2, giving upper Gaussian quantiles of z_{a} = 1.96 and zp = 0.84. Recall that л with a subscript strictly between 0 and 1 indicates that value for which a standard Gaussian random variable has that probability above it.
It may be of use in practice, and will be essential in the efficiency calculations below, to approximate which member of the alternative hypothesis corresponds with a test of a given power, with sample size held fixed. Solving (2.24) exactly for 9^{A} is difficult, since the function //. is generally nonlinear. Approximating this function using a oneterm Taylor approximation,
(Contrast this with approximation of the alternat ive standard deviation by the null standard deviation, as in the transition from (2.20) to (2.22). Approximation by the leading term alone cannot be applied to the expectation Pj(0), since it would remove all of the effect of the difference between null and alternative.) The power for test j is approximately
for
The quantity ej is called the efficacy of test j. Setting this power to 1 — /3, z_{a} — ynjejfb' = zi—fi. Solving this equation for 9^{A},
verifying the requirement that 0^{A} get close to zero. This expression can be used to approximate an effect size needed to obtain a certain power with a certain sample size and test level, and will be used in the context of asymptotic relative efficiency.
Effciency Calculations
Equating the alternative hypothesis parameter values (2.25) corresponding to power 1  /3, then (z_{a}  Zi^p)/y/rv[ei = (z_{a}  zi$)/_{v}/nje_{2}, or
Note that this relative efficiency doesn’t depend on a or /3, or on n. As an example, suppose Xi,..., X_{n} are independent observations from a symmetric distribution with finite variance p^{2} and mean в. Then в is also the median of these observations. Compare tests T), the ftest, and Т_{2}, the sign test. Then T has a distribution depending on the distribution of Xj, and T_{2} has a binomial distribution. Note that T has approximately a standard Gaussian distribution for large n. That is, T ~ Ф(в/р, 1/rii), and
On the other hand, T_{2} ~ <5>(/л_{2}(0),
Hence /i_{2}(0) = /(0), and er_{2}(0) = 1/2, and e_{2} = 2/(0).
TABLE 2.4: Empirical powers for onesample location tests with sample size ratios indicated by asymptotic relative efficiency
Larger sample size 
t test 
sign test 
20 
0.5623 
0.2241 
100 
0.5647 
0.3897 
1000 
0.5594 
0.5040 
10000 
0.5621 
0.5438 
The asymptotic relative efficiencies of these statistics depends on the distribution that generates the data. If data come from €>((). p^{2}), then /4(0) = l/(/27rp),
If the data come from a Laplace distribution, then p = 1, since the Laplace distribution has variance 1. Substituting into (2.26), /4(0) = 1, o(0) = 1, and ej = 1. Also /4(0) = l//2, 02(0) = 1/2, and e2 = /2. Hence щ/п^ « (V^)^{2} = 2; in this case, the sign test is more powerful, requiring roughly half the sample size as does the /test.
Table 2.4 contains results of a simulation to check actual powers that the asymptotic relative efficiency calculations show should be approximately the same. The table shows powers of the level 0.05 twosided t and exact sign tests for Laplace data sets, of size n and 712 = n/2 respectively, shifted to have expectation З/щ. Data sets need to be quite large in order for sample sizes in the ratio of the asymptotic relative efficiency to give equal power.
Now suppose these data come from a Cauchy distribution shifted to have point of symmetry в. In this case, the expectation of the distribution does not exist, the standard deviation p is infinite, and the distribution is not approximately Gaussian even in large samples. In fact, the distribution of the the mean of Cauchy random variables is again a Cauchy random variable, with no change in the spread of the distribution. Plugging into the definition of efficacy, without worrying about regularity conditions, gives /4(0) = 1, <7] (0) = 00, and ei = 0. On the other hand, the quantities for the sign test are
Hence, for Cauchy responses, the efficiency of the sign test relative to the t test is щ/п2 ~ 00. This abuse of notation retains the interpretation that the sign test is infinitely more efficient for Cauchy observations.
Table 2.5 summarizes these calculations.
TABLE 2.5: Efficacies for onesample location tests
ftest 
Sign test 
Relative 

Gaussian 
m'(o) 
1/p 
1/(/27tp) 


1 
1/2 

e 
^{l}/p 
№/p 
V^/2 

Laplace 
m'(0) 
1 
1/V2 


1 
1/2 

e 
1 
V2 
1/V2 

Cauchy 
m'(0) 
1 
7V ^{1} 


OO 
1/2 

e 
0 
2/7Г 
0 
Examples of Power Calculations
The Gaussian approximations to power (2.17), to sample size (2.24), and to effect size (2.25), may be used to assist in planning an experiment.
Example 2.4.1 In this example I calculate power for a sign test applied to 49 observations from a Gaussian distribution with unit variance. Suppose Xi,...,Xt9 ~ €>(1). 1), with null hypothesis 6 = 0 and alternative hypothesis 6 = 1/2. The sign test statistic, divided by n, approximately satisfies (2.15) and (2.19), with p and a given by (2.27). Then //_{г}(0) = .5, eri(0) = ^0.5 x 0.5 = .5, mi(0.5) = 0.691,
If, instead, a test of power 0.85 were desired for alternative expectation 1/2, with a onesided test of level 0.025, z_{a} = 1.96, and zp = 1.036. From (2.2f.), one needs at least
observations; choose 53.
Finally, one might determine how large an effect one might detect using the original )9 observations with a test of level 0.025 and power 0.85. One could use e = /2/тг = 0.797, from the box in Table 2.5 specific to the sign test and the Gaussian distribution. Expression (2.25) gives this number as (1.96 + 1.036)/(7 x 0.797) = 0.537.
Distribution Function Estimation
Suppose one wishes to estimate a common distribution function of Xi,..., X_{n }independent variables. For x in the range of Xj, let F(x) be the number of data points less than or equal to x, divided by n. Since the observations are independent, F(x) ~ n~^{1}Sin(n, F(x)). A confidence interval for F(x) is
The above intervals will extend outside [0,1], which is not reasonable; this can be circumvented by transforming the probability scale.
Figure 2.4 represents the bounds from (2.28), without any rescaling, to be discussed further in the next example. Confidence bounds in Figure 2.4 exhibit occurrences of larger estimates being associated with upper confidence bounds that are smaller (ex., in Figure 2.4, the region between the secondtolargest and the largest observations), and for the region with the cumulative distribution function estimated at zero or one (that is, the region below the smallest observed value, and the region above the largest observed value), confidence limits lie on top of the estimates, indicating no uncertainty. Both of these phenomena are unrealistic. The first phenomenon, that of nonmonotonic confidence bounds, cannot be reliably avoided through rescaling; the second, with the upper confidence bounds outside the range of the data, can never be repaired through rescaling. A preferred solution is to substitute the intervals of Clopper and Pearson (1934), described in §1.2.2.1, to avoid all three of these problems (viz., bounds outside (0,1), bounds ordered differently than the estimate, and bounds with zero variability). Such intervals are exhibited in Figure 2.5.
Finally, the confidence associated with these bounds is pointwise, and not simultaneous. That is, if {L, Ui) and (L2, U_{2}) and are 1 —a confidence bounds associated with two ordinates xi and a;2, then P [L < F(x 1) < f/j] > 1 — a and P [L_{2} < F(x2) < U_{2}] > 1 — a, at least approximately, but the preceding argument does not bound P [L < F(x 1) < U and L_{2} < F(x2) < U_{2} any higher than 1 — 2a.
Example 2.5.1 Consider the arsenic data of Example 2.3.2. For every real x, one counts the number of data points less than this x. For any x less than the smallest value 0.073, this estimate is F(x) = 0. For x greater than or equal to this smallest value and smaller than the next smallest value 0.080, the estimate is F(x) = 1/21. This data set contains one duplicate value 0.118. For values below, but close to, 0.118 (for example, x = 0.1179). F(x) = 4/21, since 21 of the observations are less than x. However, F(x) = 6/21; the jump here is twice what it is at other data values, since there are two observations here. This estimate is sketched in both Figures 2. and 2.5, and may be constructed in R using ecdf(arsenicSnails), presuming the data of Example 2.3.2 is still present to R. The command ecdf (arsenicSnails) does not produce confidence intervals; use
library(MultNonParam); ecdfcis(arsenic$nails,exact=FALSE)
to add confidence bounds, and changing exact to TRUE forces exact intervals.
FIGURE 2.4: Empirical CDF and Confidence Bounds for Arsenic in Nails
Exercises
1. Calculate the asymptotic relative efficiency for the sign statistic relative to the onesample ttest (which you should approximate using the onesample гtest). Do this for observations from the
a. uniform distribution, on [—1/2,1/2] with variance 1/12 and mean under the null hypothesis of 0, and
b. logistic distribution, symmetric about 0, with variance 7t/3 and density exp(cc)/(l + exp (a:)).
FIGURE 2.5: Empirical CDF and Confidence Bounds for Arsenic in Nails
and last 10 lines are data set description, and should be deleted. (Line 117 is blank, and should also be deleted).
4. Suppose 49 observations are drawn from a Cauchy distribution, displaced to have location parameter 1.
a. What is the power of the sign test at level 0.05 to test the null hypothesis of expectation zero for these observations?
b. What size sample is needed to distinguish between a null hypothesis of median 0 and an alternative hypothesis of median 1, for independent Cauchy variables with a onesided level 0.025 sign test to give 80% power?