 # One-Sample Nonparametric Inference

This chapter first reviews standard Gaussian-theory inference on one sample location models. It then presents motivation for why a distribution-free approach to location testing is necessary, and presents nonparametric techniques for inference on quantiles. Later in this chapter, techniques for comparing the efficiencies of tests are introduced, and these are applied to various parametric and nonparametric tests. Finally, techniques for estimating a single cumulative distribution function are discussed.

## Parametric Inference on Means

Suppose one wants to learn about в = E[Xj], from a sample Xi,..., Xj, .... Xn of independent and identically distributed random variables. When one knows the parametric family generating this set of independent data, this information may be used to construct testing and estimation methods tailored to the individual distribution. The variety of such techniques is so large that only those presuming approximately a Gaussian model will be reviewed in this volume, and in what follows, parametric analyses for comparison purposes will be taken to assume approximate Gaussian distributions.

### Estimation Using Averages

Practitioners often estimate the location of a distribution using the sample average If a new data set is created using an affine transformation Yj = a + bXj, then Y = a + bX, and the sample average is equivariant under affine transformations. For example, average temperature in degrees Fahrenheit Y may be calculated from average temperature in degrees Celsius A using Y = 32 + 1.8X, without needing access to the original measurements.

If these variables have a finite variance a2, then the central limit theorem (CLT) ensures that X is approximately a2/n); however, many common techniques designed for data with a Gaussian distribution require consequences of this distribution beyond the marginal distribution of the sample average.

### One-Sample Testing for Gaussian Observations

To test the null hypothesis 9 = 9° versus the alternative 9 > 9°, reject the null hypothesis if X > 9° + zaa/fn. To test the null hypothesis 9 = 9° versus the two-sided alternative 9 ф 9°, reject the null hypothesis if X > 9° + га/^а//n, or if A < zaf2/fn. If a is not known, substitute the estimate s =

X)2/(n — 1), and compare this quantity to the t distribution with n — 1 degrees of freedom.

## The Need for Distribution-Free Tests

Table 2.1 contains actual test levels for some tests of location parameters for four of the families described in §1.1.1. True levels were determined via simulation; a large number of samples were drawn from each of the distributions under the null hypothesis, the specified test statistic was calculated, the test of §2.1.2 was performed for each simulated data set, and the proportion of times the null hypothesis was rejected was tabulated. For now, restrict attention to the first line in each subtable, corresponding to the t-test. Null hypotheses in Table 2.1 are in terms of the distribution median. The f-test, however, is appropriate for hypotheses involving the expectation. In the Gaussian, Laplace, and uniform cases, the median coincides with the expectation, and so standard asymptotic theory justifies the use of the t-test. In the Cauchy example, as noted before, even though the distribution is symmetric, no expectation exists, and the t-test is inappropriate. However, generally, data analysts do not have sufficient information to distinguish the Cauchy example from the set of distributions having enough moments to justify the t-test, and so it is important to stud)' the implications of such an inappropriate use of methodology.

For both sample sizes, observations from a Gaussian distribution give the targeted level, as expected. Observations from the Laplace distribution give a level close to the targeted level. Observations from the Cauchy distribution give a level much smaller than the targeted level, which is paradoxical, because one might expect heavy tails to make it anti-conservative. Figure 2.1 shows the density resulting from Studentizing the average of independent Cauchy variables. The resulting density is bimodal, with tails lighter than one would otherwise expect. This shows that larger values of the sample standard deviation in the denominator of the Studentized statistic act more strongly than larger values of components of the average in the numerator.

TABLE 2.1: True levels for the T Test, and Sign Test, and Exact Sign Test, nominal level 0.05

 (a) Sample size 10, Two-Sided Gaussian Cauchy Laplace Uniform T 0.05028 0.01879 0.04098 0.05382 Approximate Sign 0.02127 0.02166 0.02165 0.02060 Exact Sign 0.02127 0.02166 0.02165 0.02060 (b) Sample size 17, Two-Sided Gaussian Cauchy Laplace Uniform T 0.05017 0.02003 0.04593 0.05247 Approximate Sign 0.01234 0.01299 0.01274 0.01310 Exact Sign 0.04847 0.04860 0.04871 0.04898 (c) Sample size 40, Two-Sided Gaussian Cauchy Laplace Uniform T 0.04938 0.02023 0.04722 0.05029 Approximate Sign 0.03915 0.03952 0.03892 0.03904 Exact Sign 0.03915 0.03952 0.03892 0.03904

In all cases above, the t-test succeeds in providing a test level not much larger than the target nominal level. On the other hand, in some cases the true level is significantly below that expected.

This effect decreases as sample level increases.

## One-Sample Median Methods

For moderate sample sizes, then, the standard one-sample t-test fails to control test level as the distribution of summands changes. Techniques that avoid this problem are developed in this section. These methods apply in broad generality, including in cases when the expectation of the individual observations does not exist. Because of this, inference about the population median rather than the expectation is pursued. Recall that the median в of random variable Xj is defined so that Below, the term median refers to the population version, unless otherwise specified.

FIGURE 2.1: Density of Studentized Cauchy made Symmetric, Sample Size 10 ### Estimates of the Population Median

An estimator smed [Xl, ..., Xn] of the population median may be constructed by applying (2.2) to the empirical distribution of X,. formed by putting point mass on each of the n values. In this case, with n odd, the median is the middle value, and, with n even, (2.2) fails to uniquely define the estimator. In this case, the estimator is conventionally defined to be the average of the middle two values. By this convention, with X( ^,..., X) the ordered values in the sample, Alternatively, one might define the sample median to minimize the sum of distances from the median: that is, the estimate minimizes the sum of distances from data points to the potential median value, with distance measured by the sum of absolute values. This definition (2.4) exactly coincides with the earlier definition (2.3) for n odd, shares in the earlier definition’s lack of uniqueness for even sample sizes, and typically shares the opposite resolution (averaging the middle two observations) of this non-uniqueness. In contrast, the sample mean X of (2.1) satisfies Under certain circumstances, the sample median is approximately Gaussian. Central limit theorems for the sample median generally require only that the density of the raw observations be positive in a neighborhood of the population median.

In §2.1.1 it was claimed that the sample average is equivariant for affine transformation. A stronger property holds for medians; if Y, = h(Xi), for h monotonic, then for n odd, smed [У),..., Уп] = /i(smed [Xt,..., X„]). For n even, this is approximately true, except that the averaging of the middle two observations undermines exact equivariance for non-affine transformations. Both (2.4) and (2.5) are special cases of an estimator defined by for some convex function o: the sample mean uses g(z) = z2 j2 and the sample median uses g{z) = z. Huber (1964) suggests an alternative estimator combining the behavior of the mean and median, by taking p quadratic for small values, and continuing linearly for larger values, thus balancing increased efficiency of the mean and the smaller dependence on outliers of the median; he suggests and recommends a value of the tuning parameter к between 1 and 2.

### Hypothesis Tests Concerning the Population Median

Techniques in this section can be traced to Arbuthnott (1712), as described in the example below. Fisher (1930) treats this test as too obvious to require comment.

Consider independent identically distributed random variables Xj for г = 1,... ,n. To test whether a putative median value is the true value, define new random variables Then under Hq : 9 = 9°, Yj ~ Bin(l/2,1). This logic only works if assume this. It is usually easier to assess this continuity assumption than it is for distributional assumptions. Then the median inference problem reduces to one of binomial testing. Let T(6°) = X!j=i Yj be the number of observations less than or equal to 9°. Pick t; and tu so that ^^t|1(l/2)”(”) > 1 — a. One might choose ti and tu symmetrically, so that ft is the largest value such that That is, t; is that potential value for T such that not more than a/2 probability sits below it. The largest such f; has probability at least 1 — a/2 equal to or larger than it, and at least a/2 equal to or smaller than it; hence t; is the a/2 quantile of the 33in(n, 1/2) distribution. Generali}', the inequality in (2.10) is strict; that is, < is actually <. For combinations of n and a for which this inequality holds with equality, the quantile is not uniquely defined, and take the quantile to be the lowest candidate. Symmetrically, one might choose the smallest tu so that n+1 — tu is the a/2 quantile of the 33in(n, 1/2) distribution, with the opposite convention used in case the quantile is not uniquely defined.

Then, reject the null hypothesis if T < t°L or T > tfj for L = f; — 1 and ty = tu. This test is called the (exact) sign test, or the binomial test (Higgins, 2004). An approximate version of the sign test might be created by selecting critical values from the Gaussian approximation to the distribution of T(9°).

Again, direct attention to Table 2.1. Both variants of the sign test succeed in keeping the test level no larger than the nominal value. However, the sign test variants, because of the discreteness of the binomial distribution, in some cases achieve levels much smaller than the nominal target. Subtable (a), for sample size 10, is the most extreme example of this; subtable (b), for sample size 17, represents the smallest reduction in actual sample size, and subtable (c), for sample size 40, is intermediate. Note further that, while the asymptotic sign test, based on the Gaussian approximation, is not identical to the exact version, for subtables (a) and (c) the levels coincide exactly, since for all simulated data sets, the p-values either exceed 0.05 or fail to exceed 0.05 for both tests. Subtable (b) exhibits a case in which for one data value, the exact and approximate sign tests disagree on whether p-values exceed 0.05.

Table 2.2 presents characteristics of the exact two-sided binomial test of the null hypothesis that the probability of success is half, with level a = 0.05, applied to small samples. In this case, the two-sided p-value is obtained by doubling the one-sided p value.

For small samples (n < 6), the smallest one-sided p-value, 1/2", is greater than .025, and the null hypothesis is never rejected. Such small samples are omitted from Table 2.2. This table consists of two subtables side by side, for

TABLE 2.2: Exact levels and exact and asymptotic lower critical values for symmetric two-sided binomial tests of nominal level 0.05

 n Critical Exact Value t( — 1 Asymptotic Exact Levels n Critical Exact Value t{ — 1 Asymptotic Exact Levels 6 0 0 0.0313 24 6 6 0.0227 7 0 0 0.0156 25 7 7 0.0433 8 0 0 0.0078 26 7 7 0.0290 9 1 1 0.0391 27 7 7 0.0192 10 1 1 0.0215 28 8 8 0.0357 11 1 1 0.0117 29 8 8 0.0241 12 2 2 0.0386 30 9 9 0.0428 13 2 2 0.0225 31 9 9 0.0294 14 2 2 0.0129 32 9 9 0.0201 15 3 3 0.0352 33 10 10 0.0351 16 3 3 0.0213 34 10 10 0.0243 17 4 3 0.0490 35 11 11 0.0410 18 4 4 0.0309 36 11 11 0.0288 19 4 4 0.0192 37 12 12 0.0470 20 5 5 0.0414 38 12 12 0.0336 21 5 5 0.0266 39 12 12 0.0237 22 5 5 0.0169 40 13 13 0.0385 23 6 6 0.0347 41 13 13 0.0275

п < 23, and for n > 23. The first column of each subtable is sample size. The second is t; — 1 from (2.10). The third is the value taken from performing the same operation on the Gaussian approximation; that is, it is the largest a such that The fourth is the observed test level; that is, it is double the right side of (2.10). Observations here agree with those from Table 2.1; for sample size 10, the level of the binomial test is severe^ too small, for sample size 17, the binomial test has close to the optimal level, and for sample size 40, the level for the binomial test is moderately too small.

A complication (or, to an optimist, an opportunity for improved approximation) arises when approximating a discrete distribution by a continuous distribution. Consider the case with n = 10, exhibited in Figure 2.2. Bar areas represent the probability under the null hypothesis of observing the number of successes. Table 2.2 indicates that the one-sided test of level 0.05 rejects the null hypothesis for W <1. The actual test size is 0.0215, which is graphically represented as the sum of the areas in the bar centered at 1, and the very small area of the neighboring bar centered at 0. Expression (2.12) approximates the sum of these two bar areas by the area under the dotted curve, representing the Gaussian density with the appropriate expectation n/2 = 5 and standard deviation /n./2 = 1.58. In order to align the areas of the bars most closely with the area under the curve, the Gaussian area should be taken to extend to the upper end of the bar containing 1; that is, evaluate the Gaussian distribution function at 1.5, explaining the 0.5 in (2.12). More generally, for a discrete distribution with potential values Д units apart, the ordinate is shifted by Д/2 before applying a Gaussian approximation; this adjustment is called a correction for continuity.

FIGURE 2.2: Approximate Probability Calculation for Sign Test Sample Size 10, Target Level 0.05, and from Table 2.2 a — 1 = 1

The power of the sign test is determined by Р#л [Xj < d°] for values of eA Ф e°. Since eA > if Р„л [X,- < 0°] < l/2, alternatives вА > в0 correspond to one sided alternatives P [Yy = 1] < 1/2.

If в0 is the true population median of the Xj, and if there exists a set of form (в0 — f. 0° + f), with f > 0, such that P [Xj € (0° — f. 0{) + e)] = 0, then any other в in this set is also a population median for Xj, and hence the test will have power against such alternatives no larger than the test level. Such occurrences are rare.

Table 2.3 represents powers for these various tests for various sample levels. The alternative is chosen to make the f-test have power approximately .80 for the Gaussian and Laplace distributions, using (1.9). In this case both a A for the Gaussian and Laplace distributions are 1 jfn. Formula (1.9) is inappropriate for the Cauchy distribution, since in this case X does not have

TABLE 2.3: Power for the T Test, Sign Test, and Exact Sign Test, nominal level 0.05

 (a) Sample size 10, Two-Sided Gaussian Cauchy Laplace T 0.70593 0.14345 0.73700 Approximate Sign 0.41772 0.20506 0.57222 Exact Sign 0.41772 0.20506 0.57222 (b) Sample size 17, Two-Sided Gaussian Cauchy Laplace T 0.74886 0.10946 0.76456 Approximate Sign 0.35747 0.17954 0.58984 Exact Sign 0.57893 0.35759 0.79011 (c) Sample size 40, Two-Sided Gaussian Cauchy Laplace T 0.78152 0.06307 0.78562 Approximate Sign 0.55462 0.35561 0.84331 Exact Sign 0.55462 0.35561 0.84331

a distribution that is approximately Gaussian. For the Cauchy distribution, the same alternative as for the Gaussian and Laplace distributions is used.

Results in Table 2.3 show that for a sample size for which the sign test level approximates the nominal level (n = 17), use of the sign test for Gaussian data results in a moderate loss in power relative to the t-test, while use of the sign test results in a moderate gain in power for Laplace observations, and in a substantial gain in power for Cauchy observations.

Example 2.3.1 An early (and very simple) application of this test was to test whether the proportion of boys born in a given year is the same as the proportion of girls bom that year (Arbuthnott, 1712). Number of births was determined for a period of 82 years. Let X} represent the number of births of boys, minus the number of births of girls, in year j. The parameter в represents the median amount by which the number of girls exceeds the number of boys; its null value is 0. Let Yj take the value 0 for years in which more girls than boys are bom, and 1 otherwise. Note that in this case, (2.9) is violated, but P [Xj = 0] is small, and this violation is not important. Test at level 0.05.

Values in (2.10) and (2.11) are ti = 32 and tu = 51, obtained as the

0.025 and 0.975 quantiles of the binomial distribution with 82 trials and success probability .5. Reject the null hypothesis if T <32 or if T >51. (The asymmetry in the treatment of the lower and upper critical values is intentional, and is the result of the asymmetry in the definition of the distribution function for discrete variables.)

In each of these years Xj <0, and so Yj = 1, and T = 82. Reject the null hypothesis of equal proportion of births. The original analysis of this data presented what is now considered the p-value; the one sided value of (1.Ц) trivially P [T >= 82] = (1/2)82, which is tiny. The two-sided p-value of (1.15) is 2 x (1/2)82 = (1/2)81, which is still tiny.

### Con dence Intervals for the Median

Apply the test inversion approach of §1.2 to the sign test that rejects Ho : 9 = 9° if fewer than ti or at least tu data points are less than or equal to 9°. Let X(.) referring to the data values after ordering. When 9° < X(i), then T(9°) = 0. For (X{1),X{2)], T(0°) = 1. For e (X(2),X(3)], T(9°) = 2. In each case, the ( at the beginning of the interval and the ] at the end of the interval arise from (2.8), because observations that are exactly equal to are coded as one. Hence the test rejects Ho if 9° < X(tl) or > Xj, and, for any 9°, This relation leads to the confidence interval ^f(tu)]- However, since

the data have a continuous distribution, then A)tu) also has a continuous distribution, and for any 9°. Hence P[X{tl) < 9° < A(,u)] > 1 - a, and one might exclude the upper end point, to obtain the interval (Xy,), X(tu))-

Example 2.3.2 Consider data from

http://lib.stat.cmu.edu/datasets/Arsenic

from a pilot study on the uptake of arsenic from drinking water. Column six of this file gives arsenic concentrations in toenail clippings, in parts per million. The link above is to a Word file; the file

http://stat.rutgers.edu/home/kolassa/Data/arsenic.dat

contains a plain text version. Sorted nail arsenic values are

0.073, 0.080, 0.099, 0.105, 0.118, 0.118, 0.119, 0.135, 0.141, 0.158, 0.175, 0.269, 0.275, 0.277, 0.310, 0.358, 0.433, 0.517, 0.832, 0.851, 2.252.

We construct a confidence interval for the (natural) log of toenail arsenic. The sign test statistic has a Dm(21, .5) distribution under the null hypothesis. We choose the largest ti such that Po [T < t{] < a/2. The first few terms in (2.10) are and cumulative probabilities are The largest of these cumulative sums smaller than 0.025 is the sixth, corresponding to T < 6. Hence ti = 6. Similarly, tu = 16. Reject the null hypothesis that the mean is 0.26 if T <6 or if T > 16. Since 11 of the observations are greater than the null median 0.26, T = 11. Do not reject the null hypothesis.

Alternatively, one might calculate ap-value. Using (1.15), the p-value is 2min(P0 [T > 11] ,P0 [T < 11]) = 1.

Furthermore, the confidence interval for the median is Х(щ) =

(0.118,0.358).

The values ti and tu may be calculated in R by

a<-qbinom(0.025,21,.5); b<-21+l-qbinom(0.025,21,.5)

and the ensemble of calculations might also have been performed in R using

arsenic<-as.data.frame(scan(arsenic.dat’,

what=list(age=0,sex=0,drink=0,cook=0,water=0,nails=0))) library(BSDA)#Gives sign test.

SIGN.test(arsenic\$nails,md=0.26)#Argument md gives null hyp.

Graphical construction of a confidence interval for the median is calculated by

library(NonparametricHeuristic)

invertsigntest(log(arsenic\$nails),maint="Log Nail Arsenic")

and is given in Figure 2.3. Instructions for installing this last library are given in the introduction, and in Appendix B.

Figure 2.3 exhibits construction of the confidence interval in the previous example; I apply these techniques on the log scale. The confidence interval is the set of log medians that yield a test statistic for which the null hypothesis is not rejected. Values of the statistic for which the null hypothesis is not rejected are between the horizontal lines; log medians in the confidence intervals are values of the test statistic within this region.

FIGURE 2.3: Construction of Cl for Log Nail Arsenic Location In this construction, order statistics (that is, the ordered values) are first plotted on the horizontal axis, with the place in the ordered data set on the vertical axis. These points are represented by the points in Figure 2.3 where the step function transitions from vertical to horizontal, as one moves from lower left to upper right. Next, draw horizontal lines at the values t; and tu, given by (2.10) and (2.11) respectively. Finally, draw vertical lines through the data points that these horizontal lines hit.

For this particular example, the exact one-sided binomial test of level 0.025 rejects the null hypothesis that the event probability is half if the sum of event indicators is 0, 1, 2, 3, 4, or 5; t; = 6. For Yj of (2.8), the sum is less than 6 for all в to the left of the point marked Similarly, the one-sided level

0.025 test in the other direction rejects the null hypothesis if the sum of event indicators is at least tu = 16. The sum of the Yj exceeds 15 for в to the right of the point marked X(tuy

By symmetry, one might expect ti = n — tu, but this is not the case. The asymmetry in definitions (2.10) and (2.11) arises because construction of the confidence interval requires counting not the data points, but the n — 1 spaces between them, plus the regions below the minimum and above the maximum, for a total of n + 1 ranges. Then £/ = n + 1 — tu.

This interval is not of the usual form 0 ± 2d, for a with a factor of 1 jfn. Cramer (1946, pp. 368f.) shows that if Xi,..., Xn is a set of independent random variables, each having density /, then Var [smed [Xi,..., X,,]] я» 1/(4f(0)2n). Chapter 8 investigates estimation of this density; this estimate can be used to estimate the median variance, but density estimation is harder than the earlier confidence interval rule.

### Inference for Other Quantiles

The quantile 9 corresponding to probability 7 is defined by P« [Xj < 9] = 7'. Suppose that 9 is quantile 7 € (0,1) of distribution of independent and identically distributed continuous random variables Xi,..., Xn. Then one can produce a generalized sign test. Define the null and alternative hypotheses Ho : в = 6° and Ha : 9 f 9°. As before, T{9) is the number of observations smaller than or ecpial to 9. For the true value 9 of the quantile, T ~ Bin(n, 7). Choose t; and tu so that 7J(1 — 7')”-^(”) >1 — 0:. Often, one chooses

the largest ti and smallest tu so that this t/ is q/2 quantile of the Uin(n, 7) distribution, and n + 1 — tu is the a/2 quantile of the Sin(n, 1 — 7) distribution. Hence ti ss nq — 717(1 — 7')га/2

and tu яз nq + v/nq(l — 7)гЛ//2• One then rejects Ho if T < ti or T >tu.

This test is then inverted to obtain (X(t|p Xpu)) as the confidence interval for 9. Note that the confidence level is conservative: P < в < А/)] =

1 — P [X(4l) > d] — P > X(tu)] >l — a. For any given 9, the inequality is generally strict.

Example 2.3.3 Test the null hypothesis that the upper quartile (that is, the 0.75 quantile) of the arsenic nail data from Example 2.3.2 is the reference value 0.26, and give a confidence interval for this quantile. The analysis is the same as before, except that ti and tu are different. We determine ti and tu in (2.13). Direct calculation, or using the R commands

a<-qbinom(0.025,21,.75);b<-21+l-qbinom(0.025,21,1-0.75)

shows ti = 12 and tu = 20. Since T = 10 < ti, reject the null hypothesis that the upper quartile is 0.26. Furthermore, the confidence interval is the region between the twelfth and twentieth ordered values, [X(12), X(20)) = (0.269,0.851). With data present in the R workspace, one calculates a confidence interval as

sort(arseniclnails)[c(a,b)]

and the p-value as tt<-10

2*min(c(pbinom(tt,21,.75), pbinom(21+l-tt,21,1-.75)))

to give 0.0128.

Dependence of the test statistic T(0) on в is relatively simple. Later inversions of more complicated statistics will make use of the simplifying device of first, shifting all or part of the data by subtracting 0, and then testing the null hypothesis that the location parameter for this shifted variable is zero.