OneSample Nonparametric Inference
 Parametric Inference on Means
 Estimation Using Averages
 OneSample Testing for Gaussian Observations
 The Need for DistributionFree Tests
 OneSample Median Methods
 Estimates of the Population Median
 Hypothesis Tests Concerning the Population Median
 Con dence Intervals for the Median
 Inference for Other Quantiles
This chapter first reviews standard Gaussiantheory inference on one sample location models. It then presents motivation for why a distributionfree approach to location testing is necessary, and presents nonparametric techniques for inference on quantiles. Later in this chapter, techniques for comparing the efficiencies of tests are introduced, and these are applied to various parametric and nonparametric tests. Finally, techniques for estimating a single cumulative distribution function are discussed.
Parametric Inference on Means
Suppose one wants to learn about в = E[Xj], from a sample Xi,..., Xj, .... X_{n} of independent and identically distributed random variables. When one knows the parametric family generating this set of independent data, this information may be used to construct testing and estimation methods tailored to the individual distribution. The variety of such techniques is so large that only those presuming approximately a Gaussian model will be reviewed in this volume, and in what follows, parametric analyses for comparison purposes will be taken to assume approximate Gaussian distributions.
Estimation Using Averages
Practitioners often estimate the location of a distribution using the sample average
If a new data set is created using an affine transformation Yj = a + bXj, then Y = a + bX, and the sample average is equivariant under affine transformations. For example, average temperature in degrees Fahrenheit Y may be calculated from average temperature in degrees Celsius A using Y = 32 + 1.8X, without needing access to the original measurements.
If these variables have a finite variance a^{2}, then the central limit theorem (CLT) ensures that X is approximately a^{2}/n); however, many common techniques designed for data with a Gaussian distribution require consequences of this distribution beyond the marginal distribution of the sample average.
OneSample Testing for Gaussian Observations
To test the null hypothesis 9 = 9° versus the alternative 9 > 9°, reject the null hypothesis if X > 9° + z_{a}a/fn. To test the null hypothesis 9 = 9° versus the twosided alternative 9 ф 9°, reject the null hypothesis if X > 9° + г_{а}/^а//n, or if A < 9° — z_{a}f2
— X)^{2}/(n — 1), and compare this quantity to the t distribution with n — 1 degrees of freedom.
The Need for DistributionFree Tests
Table 2.1 contains actual test levels for some tests of location parameters for four of the families described in §1.1.1. True levels were determined via simulation; a large number of samples were drawn from each of the distributions under the null hypothesis, the specified test statistic was calculated, the test of §2.1.2 was performed for each simulated data set, and the proportion of times the null hypothesis was rejected was tabulated. For now, restrict attention to the first line in each subtable, corresponding to the ttest. Null hypotheses in Table 2.1 are in terms of the distribution median. The ftest, however, is appropriate for hypotheses involving the expectation. In the Gaussian, Laplace, and uniform cases, the median coincides with the expectation, and so standard asymptotic theory justifies the use of the ttest. In the Cauchy example, as noted before, even though the distribution is symmetric, no expectation exists, and the ttest is inappropriate. However, generally, data analysts do not have sufficient information to distinguish the Cauchy example from the set of distributions having enough moments to justify the ttest, and so it is important to stud)' the implications of such an inappropriate use of methodology.
For both sample sizes, observations from a Gaussian distribution give the targeted level, as expected. Observations from the Laplace distribution give a level close to the targeted level. Observations from the Cauchy distribution give a level much smaller than the targeted level, which is paradoxical, because one might expect heavy tails to make it anticonservative. Figure 2.1 shows the density resulting from Studentizing the average of independent Cauchy variables. The resulting density is bimodal, with tails lighter than one would otherwise expect. This shows that larger values of the sample standard deviation in the denominator of the Studentized statistic act more strongly than larger values of components of the average in the numerator.
TABLE 2.1: True levels for the T Test, and Sign Test, and Exact Sign Test, nominal level 0.05
(a) Sample size 10, TwoSided 

Gaussian 
Cauchy 
Laplace 
Uniform 

T 
0.05028 
0.01879 
0.04098 
0.05382 
Approximate Sign 
0.02127 
0.02166 
0.02165 
0.02060 
Exact Sign 
0.02127 
0.02166 
0.02165 
0.02060 
(b) Sample size 17, TwoSided 

Gaussian 
Cauchy 
Laplace 
Uniform 

T 
0.05017 
0.02003 
0.04593 
0.05247 
Approximate Sign 
0.01234 
0.01299 
0.01274 
0.01310 
Exact Sign 
0.04847 
0.04860 
0.04871 
0.04898 
(c) Sample size 40, TwoSided 

Gaussian 
Cauchy 
Laplace 
Uniform 

T 
0.04938 
0.02023 
0.04722 
0.05029 
Approximate Sign 
0.03915 
0.03952 
0.03892 
0.03904 
Exact Sign 
0.03915 
0.03952 
0.03892 
0.03904 
In all cases above, the ttest succeeds in providing a test level not much larger than the target nominal level. On the other hand, in some cases the true level is significantly below that expected.
This effect decreases as sample level increases.
OneSample Median Methods
For moderate sample sizes, then, the standard onesample ttest fails to control test level as the distribution of summands changes. Techniques that avoid this problem are developed in this section. These methods apply in broad generality, including in cases when the expectation of the individual observations does not exist. Because of this, inference about the population median rather than the expectation is pursued. Recall that the median в of random variable Xj is defined so that
Below, the term median refers to the population version, unless otherwise specified.
FIGURE 2.1: Density of Studentized Cauchy made Symmetric, Sample Size 10
Estimates of the Population Median
An estimator smed [Xl, ..., X_{n}] of the population median may be constructed by applying (2.2) to the empirical distribution of X,. formed by putting point mass on each of the n values. In this case, with n odd, the median is the middle value, and, with n even, (2.2) fails to uniquely define the estimator. In this case, the estimator is conventionally defined to be the average of the middle two values. By this convention, with X_{(} ^,..., X_{)} the ordered values in the sample,
Alternatively, one might define the sample median to minimize the sum of distances from the median:
that is, the estimate minimizes the sum of distances from data points to the potential median value, with distance measured by the sum of absolute values. This definition (2.4) exactly coincides with the earlier definition (2.3) for n odd, shares in the earlier definition’s lack of uniqueness for even sample sizes, and typically shares the opposite resolution (averaging the middle two observations) of this nonuniqueness. In contrast, the sample mean X of (2.1) satisfies
Under certain circumstances, the sample median is approximately Gaussian. Central limit theorems for the sample median generally require only that the density of the raw observations be positive in a neighborhood of the population median.
In §2.1.1 it was claimed that the sample average is equivariant for affine transformation. A stronger property holds for medians; if Y, = h(Xi), for h monotonic, then for n odd, smed [У),..., У_{п}] = /i(smed [X_{t},..., X„]). For n even, this is approximately true, except that the averaging of the middle two observations undermines exact equivariance for nonaffine transformations. Both (2.4) and (2.5) are special cases of an estimator defined by
for some convex function o: the sample mean uses g(z) = z^{2} j2 and the sample median uses g{z) = z. Huber (1964) suggests an alternative estimator combining the behavior of the mean and median, by taking p quadratic for small values, and continuing linearly for larger values, thus balancing increased efficiency of the mean and the smaller dependence on outliers of the median; he suggests
and recommends a value of the tuning parameter к between 1 and 2.
Hypothesis Tests Concerning the Population Median
Techniques in this section can be traced to Arbuthnott (1712), as described in the example below. Fisher (1930) treats this test as too obvious to require comment.
Consider independent identically distributed random variables Xj for г = 1,... ,n. To test whether a putative median value 9° is the true value, define new random variables
Then under Hq : 9 = 9°, Yj ~ Bin(l/2,1). This logic only works if
assume this. It is usually easier to assess this continuity assumption than it is for distributional assumptions. Then the median inference problem reduces to one of binomial testing. Let T(6°) = X!j=i Yj be the number of observations less than or equal to 9°. Pick t; and t_{u} so that ^^_{t}^{1}(l/2)”(”) > 1 — a. One might choose ti and t_{u} symmetrically, so that ft is the largest value such that
That is, t; is that potential value for T such that not more than a/2 probability sits below it. The largest such f; has probability at least 1 — a/2 equal to or larger than it, and at least a/2 equal to or smaller than it; hence t; is the a/2 quantile of the 33in(n, 1/2) distribution. Generali}', the inequality in (2.10) is strict; that is, < is actually <. For combinations of n and a for which this inequality holds with equality, the quantile is not uniquely defined, and take the quantile to be the lowest candidate. Symmetrically, one might choose the smallest t_{u} so that
n+1 — t_{u} is the a/2 quantile of the 33in(n, 1/2) distribution, with the opposite convention used in case the quantile is not uniquely defined.
Then, reject the null hypothesis if T < t°_{L} or T > tfj for t°_{L} = f; — 1 and ty = t_{u}. This test is called the (exact) sign test, or the binomial test (Higgins, 2004). An approximate version of the sign test might be created by selecting critical values from the Gaussian approximation to the distribution of T(9°).
Again, direct attention to Table 2.1. Both variants of the sign test succeed in keeping the test level no larger than the nominal value. However, the sign test variants, because of the discreteness of the binomial distribution, in some cases achieve levels much smaller than the nominal target. Subtable (a), for sample size 10, is the most extreme example of this; subtable (b), for sample size 17, represents the smallest reduction in actual sample size, and subtable (c), for sample size 40, is intermediate. Note further that, while the asymptotic sign test, based on the Gaussian approximation, is not identical to the exact version, for subtables (a) and (c) the levels coincide exactly, since for all simulated data sets, the pvalues either exceed 0.05 or fail to exceed 0.05 for both tests. Subtable (b) exhibits a case in which for one data value, the exact and approximate sign tests disagree on whether pvalues exceed 0.05.
Table 2.2 presents characteristics of the exact twosided binomial test of the null hypothesis that the probability of success is half, with level a = 0.05, applied to small samples. In this case, the twosided pvalue is obtained by doubling the onesided p value.
For small samples (n < 6), the smallest onesided pvalue, 1/2", is greater than .025, and the null hypothesis is never rejected. Such small samples are omitted from Table 2.2. This table consists of two subtables side by side, for
TABLE 2.2: Exact levels and exact and asymptotic lower critical values for symmetric twosided binomial tests of nominal level 0.05
n 
Critical Exact 
Value t( — 1 Asymptotic 
Exact Levels 
n 
Critical Exact 
Value t{ — 1 Asymptotic 
Exact Levels 
6 
0 
0 
0.0313 
24 
6 
6 
0.0227 
7 
0 
0 
0.0156 
25 
7 
7 
0.0433 
8 
0 
0 
0.0078 
26 
7 
7 
0.0290 
9 
1 
1 
0.0391 
27 
7 
7 
0.0192 
10 
1 
1 
0.0215 
28 
8 
8 
0.0357 
11 
1 
1 
0.0117 
29 
8 
8 
0.0241 
12 
2 
2 
0.0386 
30 
9 
9 
0.0428 
13 
2 
2 
0.0225 
31 
9 
9 
0.0294 
14 
2 
2 
0.0129 
32 
9 
9 
0.0201 
15 
3 
3 
0.0352 
33 
10 
10 
0.0351 
16 
3 
3 
0.0213 
34 
10 
10 
0.0243 
17 
4 
3 
0.0490 
35 
11 
11 
0.0410 
18 
4 
4 
0.0309 
36 
11 
11 
0.0288 
19 
4 
4 
0.0192 
37 
12 
12 
0.0470 
20 
5 
5 
0.0414 
38 
12 
12 
0.0336 
21 
5 
5 
0.0266 
39 
12 
12 
0.0237 
22 
5 
5 
0.0169 
40 
13 
13 
0.0385 
23 
6 
6 
0.0347 
41 
13 
13 
0.0275 
п < 23, and for n > 23. The first column of each subtable is sample size. The second is t; — 1 from (2.10). The third is the value taken from performing the same operation on the Gaussian approximation; that is, it is the largest a such that
The fourth is the observed test level; that is, it is double the right side of (2.10). Observations here agree with those from Table 2.1; for sample size 10, the level of the binomial test is severe^ too small, for sample size 17, the binomial test has close to the optimal level, and for sample size 40, the level for the binomial test is moderately too small.
A complication (or, to an optimist, an opportunity for improved approximation) arises when approximating a discrete distribution by a continuous distribution. Consider the case with n = 10, exhibited in Figure 2.2. Bar areas represent the probability under the null hypothesis of observing the number of successes. Table 2.2 indicates that the onesided test of level 0.05 rejects the null hypothesis for W <1. The actual test size is 0.0215, which is graphically represented as the sum of the areas in the bar centered at 1, and the very small area of the neighboring bar centered at 0. Expression (2.12) approximates the sum of these two bar areas by the area under the dotted curve, representing the Gaussian density with the appropriate expectation n/2 = 5 and standard deviation /n./2 = 1.58. In order to align the areas of the bars most closely with the area under the curve, the Gaussian area should be taken to extend to the upper end of the bar containing 1; that is, evaluate the Gaussian distribution function at 1.5, explaining the 0.5 in (2.12). More generally, for a discrete distribution with potential values Д units apart, the ordinate is shifted by Д/2 before applying a Gaussian approximation; this adjustment is called a correction for continuity.
FIGURE 2.2: Approximate Probability Calculation for Sign Test
Sample Size 10, Target Level 0.05, and from Table 2.2 a — 1 = 1
The power of the sign test is determined by Р#л [Xj < d°] for values of e^{A} Ф e°. Since e^{A} > e° if Р„л [X, < 0°] < l/2, alternatives в^{А} > в^{0 }correspond to one sided alternatives P [Yy = 1] < 1/2.
If в^{0} is the true population median of the Xj, and if there exists a set of form (в^{0} — f. 0° + f), with f > 0, such that P [Xj € (0° — f. 0^{{)} + e)] = 0, then any other в in this set is also a population median for Xj, and hence the test will have power against such alternatives no larger than the test level. Such occurrences are rare.
Table 2.3 represents powers for these various tests for various sample levels. The alternative is chosen to make the ftest have power approximately .80 for the Gaussian and Laplace distributions, using (1.9). In this case both
TABLE 2.3: Power for the T Test, Sign Test, and Exact Sign Test, nominal level 0.05
(a) Sample size 10, TwoSided 

Gaussian 
Cauchy 
Laplace 

T 
0.70593 
0.14345 
0.73700 
Approximate Sign 
0.41772 
0.20506 
0.57222 
Exact Sign 
0.41772 
0.20506 
0.57222 
(b) Sample size 17, TwoSided 

Gaussian 
Cauchy 
Laplace 

T 
0.74886 
0.10946 
0.76456 
Approximate Sign 
0.35747 
0.17954 
0.58984 
Exact Sign 
0.57893 
0.35759 
0.79011 
(c) Sample size 40, TwoSided 

Gaussian 
Cauchy 
Laplace 

T 
0.78152 
0.06307 
0.78562 
Approximate Sign 
0.55462 
0.35561 
0.84331 
Exact Sign 
0.55462 
0.35561 
0.84331 
a distribution that is approximately Gaussian. For the Cauchy distribution, the same alternative as for the Gaussian and Laplace distributions is used.
Results in Table 2.3 show that for a sample size for which the sign test level approximates the nominal level (n = 17), use of the sign test for Gaussian data results in a moderate loss in power relative to the ttest, while use of the sign test results in a moderate gain in power for Laplace observations, and in a substantial gain in power for Cauchy observations.
Example 2.3.1 An early (and very simple) application of this test was to test whether the proportion of boys born in a given year is the same as the proportion of girls bom that year (Arbuthnott, 1712). Number of births was determined for a period of 82 years. Let X_{}} represent the number of births of boys, minus the number of births of girls, in year j. The parameter в represents the median amount by which the number of girls exceeds the number of boys; its null value is 0. Let Yj take the value 0 for years in which more girls than boys are bom, and 1 otherwise. Note that in this case, (2.9) is violated, but P [Xj = 0] is small, and this violation is not important. Test at level 0.05.
Values in (2.10) and (2.11) are ti = 32 and t_{u} = 51, obtained as the
0.025 and 0.975 quantiles of the binomial distribution with 82 trials and success probability .5. Reject the null hypothesis if T <32 or if T >51. (The asymmetry in the treatment of the lower and upper critical values is intentional, and is the result of the asymmetry in the definition of the distribution function for discrete variables.)
In each of these years Xj <0, and so Yj = 1, and T = 82. Reject the null hypothesis of equal proportion of births. The original analysis of this data presented what is now considered the pvalue; the one sided value of (1.Ц) trivially P [T >= 82] = (1/2)^{82}, which is tiny. The twosided pvalue of (1.15) is 2 x (1/2)^{82} = (1/2)^{81}, which is still tiny.
Con dence Intervals for the Median
Apply the test inversion approach of §1.2 to the sign test that rejects Ho : 9 = 9° if fewer than ti or at least t_{u} data points are less than or equal to 9°. Let X(.) referring to the data values after ordering. When 9° < X(i), then T(9°) = 0. For 9° € (X_{{}1),X_{{2)}], T(0°) = 1. For 9° e (X_{(2)},X_{(3)}], T(9°) = 2. In each case, the ( at the beginning of the interval and the ] at the end of the interval arise from (2.8), because observations that are exactly equal to 9° are coded as one. Hence the test rejects Ho if 9° < X(_{tl)} or 9° > X_{j,} and, for any 9°,
This relation leads to the confidence interval ^f(t_{u})] However, since
the data have a continuous distribution, then A)t_{u}) also has a continuous distribution, and
for any 9°. Hence P[X_{{tl}) < 9° < A(,_{u)}] > 1  a, and one might exclude the upper end point, to obtain the interval (Xy,), X(t_{u}))
Example 2.3.2 Consider data from
http://lib.stat.cmu.edu/datasets/Arsenic
from a pilot study on the uptake of arsenic from drinking water. Column six of this file gives arsenic concentrations in toenail clippings, in parts per million. The link above is to a Word file; the file
http://stat.rutgers.edu/home/kolassa/Data/arsenic.dat
contains a plain text version. Sorted nail arsenic values are
0.073, 0.080, 0.099, 0.105, 0.118, 0.118, 0.119, 0.135, 0.141, 0.158, 0.175, 0.269, 0.275, 0.277, 0.310, 0.358, 0.433, 0.517, 0.832, 0.851, 2.252.
We construct a confidence interval for the (natural) log of toenail arsenic. The sign test statistic has a Dm(21, .5) distribution under the null hypothesis. We choose the largest ti such that Po [T < t{] < a/2. The first few terms in (2.10) are
and cumulative probabilities are
The largest of these cumulative sums smaller than 0.025 is the sixth, corresponding to T < 6. Hence ti = 6. Similarly, t_{u} = 16. Reject the null hypothesis that the mean is 0.26 if T <6 or if T > 16. Since 11 of the observations are greater than the null median 0.26, T = 11. Do not reject the null hypothesis.
Alternatively, one might calculate apvalue. Using (1.15), the pvalue is 2min(P_{0} [T > 11] ,P_{0} [T < 11]) = 1.
Furthermore, the confidence interval for the median is Х(щ) =
(0.118,0.358).
The values ti and t_{u} may be calculated in R by
a<qbinom(0.025,21,.5); b<21+lqbinom(0.025,21,.5)
and the ensemble of calculations might also have been performed in R using
arsenic<as.data.frame(scan(’arsenic.dat’,
what=list(age=0,sex=0,drink=0,cook=0,water=0,nails=0))) library(BSDA)#Gives sign test.
SIGN.test(arsenic$nails,md=0.26)#Argument md gives null hyp.
Graphical construction of a confidence interval for the median is calculated by
library(NonparametricHeuristic)
invertsigntest(log(arsenic$nails),maint="Log Nail Arsenic")
and is given in Figure 2.3. Instructions for installing this last library are given in the introduction, and in Appendix B.
Figure 2.3 exhibits construction of the confidence interval in the previous example; I apply these techniques on the log scale. The confidence interval is the set of log medians that yield a test statistic for which the null hypothesis is not rejected. Values of the statistic for which the null hypothesis is not rejected are between the horizontal lines; log medians in the confidence intervals are values of the test statistic within this region.
FIGURE 2.3: Construction of Cl for Log Nail Arsenic Location
In this construction, order statistics (that is, the ordered values) are first plotted on the horizontal axis, with the place in the ordered data set on the vertical axis. These points are represented by the points in Figure 2.3 where the step function transitions from vertical to horizontal, as one moves from lower left to upper right. Next, draw horizontal lines at the values t; and t_{u}, given by (2.10) and (2.11) respectively. Finally, draw vertical lines through the data points that these horizontal lines hit.
For this particular example, the exact onesided binomial test of level 0.025 rejects the null hypothesis that the event probability is half if the sum of event indicators is 0, 1, 2, 3, 4, or 5; t; = 6. For Yj of (2.8), the sum is less than 6 for all в to the left of the point marked Similarly, the onesided level
0.025 test in the other direction rejects the null hypothesis if the sum of event indicators is at least t_{u} = 16. The sum of the Yj exceeds 15 for в to the right of the point marked X(_{tu}y
By symmetry, one might expect ti = n — t_{u}, but this is not the case. The asymmetry in definitions (2.10) and (2.11) arises because construction of the confidence interval requires counting not the data points, but the n — 1 spaces between them, plus the regions below the minimum and above the maximum, for a total of n + 1 ranges. Then £/ = n + 1 — t_{u}.
This interval is not of the usual form 0 ± 2d, for a with a factor of 1 jfn. Cramer (1946, pp. 368f.) shows that if Xi,..., X_{n} is a set of independent random variables, each having density /, then Var [smed [Xi,..., X,,]] я» 1/(4f(0)^{2}n). Chapter 8 investigates estimation of this density; this estimate can be used to estimate the median variance, but density estimation is harder than the earlier confidence interval rule.
Inference for Other Quantiles
The quantile 9 corresponding to probability 7 is defined by P« [Xj < 9] = 7'. Suppose that 9 is quantile 7 € (0,1) of distribution of independent and identically distributed continuous random variables Xi,..., X_{n}. Then one can produce a generalized sign test. Define the null and alternative hypotheses Ho : в = 6° and Ha : 9 f 9°. As before, T{9) is the number of observations smaller than or ecpial to 9. For the true value 9 of the quantile, T ~ Bin(n, 7). Choose t; and t_{u} so that 7^{J}(1 — 7')”^{}^(”) >1 — 0:. Often, one chooses
the largest ti and smallest t_{u} so that
this t/ is q/2 quantile of the Uin(n, 7) distribution, and n + 1 — t_{u} is the a/2 quantile of the Sin(n, 1 — 7) distribution. Hence ti ss nq — 717(1 — 7')г_{а}/2
and t_{u} яз nq + v/nq(l — 7)г_{Л/}/_{2}• One then rejects Ho if T < ti or T >t_{u}.
This test is then inverted to obtain (X(_{t}p Xp_{u})) as the confidence interval for 9. Note that the confidence level is conservative: P < в < А/_{(ц})] =
1 — P [X(_{4l}) > d] — P > X(_{tu})] >l — a. For any given 9, the inequality is generally strict.
Example 2.3.3 Test the null hypothesis that the upper quartile (that is, the 0.75 quantile) of the arsenic nail data from Example 2.3.2 is the reference value 0.26, and give a confidence interval for this quantile. The analysis is the same as before, except that ti and t_{u} are different. We determine ti and t_{u} in (2.13). Direct calculation, or using the R commands
a<qbinom(0.025,21,.75);b<21+lqbinom(0.025,21,10.75)
shows ti = 12 and t_{u} = 20. Since T = 10 < ti, reject the null hypothesis that the upper quartile is 0.26. Furthermore, the confidence interval is the region between the twelfth and twentieth ordered values, [X(12), X(20)) = (0.269,0.851). With data present in the R workspace, one calculates a confidence interval as
sort(arseniclnails)[c(a,b)]
and the pvalue as tt<10
2*min(c(pbinom(tt,21,.75), pbinom(21+ltt,21,1.75)))
to give 0.0128.
Dependence of the test statistic T(0) on в is relatively simple. Later inversions of more complicated statistics will make use of the simplifying device of first, shifting all or part of the data by subtracting 0, and then testing the null hypothesis that the location parameter for this shifted variable is zero.