The Jackknife

Consider a related technique, the jackknife (Quenouille, 1956; Miller, 1974), a technique to estimate the bias of an estimator, using values of the estimator evaluated on subsets of the observed sample, under certain conditions on the form of the bias. Suppose that the bias of estimator Tn of в based on n observations is

for some unknown a. Here 0(n-3/2) denotes a quantity which, after multiplication by n3/2, is bounded for all u,

Examples of Biases of the Proper Order

Quenouille (1956) suggested this technique in situations where (10.10) held with at least one more term of form b/n2, and with a correspondingly smaller error. That is, (10.10) is replaced with

For example, if X,..., Xn are independent, and identically distributed with distribution (//,, a2), the maximum likelihood estimator for a2 is a2 = 52j=i(XjX)2/n. Note that cr2 = no2j(n — 1) is an unbiased estimator of a2, since Ea2 [d2] = (n— 1)2 jn = 2 —a2jn. Then (10.10) holds with a = a2 and 6 = 0.

Restriction (10.11) seems unnecessarily strict, and cases in which it holds are less common. For example, let Wn be an unbiased estimate of a parameter w, and let Tn = g(Wn), for some smooth function g. Let в = д(ш), and assume that Var [IF,,] « a/n. Then

and

Then (10.10) holds, and (10.11) holds if the skewness of Wn times Jn converges to 0.

Bias Correction

Let i be the estimator based on the sample of size n — 1 with observation i omitted. Let T* = X!"=i Then the bias of T* is approximately

a/(n — 1). Under (10.10), В = (n — 1)(T)* — Tn) estimates the bias, since

Then the bias of Tn — В is 0(n-3/2). Furthermore, under the more stringent requirement (10.11), the bias is ()(n~2).

Correcting the Bias in Mean Estimators

Sample means are always unbiased estimators of the population expectation. Consider application of the jackknife to the sample mean. In this case. Tn =

E”=1 Xj/n, T*_u = ZUj# X^(n ~ !)• and

and hence the bias estimate is zero.

Correcting the Bias in Quantile Estimators

Consider the jackknife bias estimate for the median for continuous data, and, for the sake of defining the Т*_г j, let г index the order statistic X^.

When the sample size n is even, then Tn = (X(ra/2) + X(n/2 - i))/2. and

Then T*_L = (X(n/2) + -T(n/2+i))/“ = T„, and the bias estimate is always 0. For n odd, then Tn = ^((n+i)/2)i and

and the average of results from the smaller sample is Hence

and the bias estimate is

Example 10.4.1 Consider again the nail arsenic data of Example 2.3.2. Calculate the Jackknife estimate of bias for the mean, median, and trimmed mean for these data. Jackknifing is done using the bootstrap library, using the function jackknife.

library(bootstrap)#gives jackknife jackknife(arsenic$nails,median)

This function produces the 21 values, each with one observation omitted: $jack.values

[1] 0.2220 0.2220 0.2220 0.2220 0.1665 0.1665 0.2220 0.2220 [9] 0.1665 0.2220 0.2220 0.1665 0.1665 0.1665 0.1665 0.1665 [17] 0.1665 0.2220 0.1665 0.2220 0.2135

Each of these values is the median of 20 observations. Ten of them, corresponding to the omission of the lowest ten values, are the averages of Х/щ and Х(цу Ten of them, corresponding to the omission of the highest ten values, are the averages of X^ andX^y The last, corresponding to the omission of the middle value, is the average of X(9) and Xyiy The mean of the jackknife observations is 0.1952. The sample median is 0.1750, and the bias adjustment is 20 x (0.1952 — 0.1750) = 20 x 0.0202 = 0.404, as is given by R:

$jack.bias

[1] 0.4033333

This bias estimate for the median seems remarkably large. From, (10.12) the jackknife bias estimate for the median is governed by the difference between the middle value and the average of its neighbors. This data set features an unusually large gap between the middle observation and the one above it.

Applying the jackknife to the mean via jackknife(arsenic$nails,mean)

gives the bias correction 0:

$jack.bias

[1] 0

as predicted above. One can also jackknife the trimmed mean: jackknife(arsenic$nails,mean,trim=0.25)

The above mean is defined as the mean of the middle half of the data, with 0.25 proportion trimmed from each end. Although the conventional mean is unbiased, this unbiasedness does not extend to the trimmed mean:

$jack.bias [1] -0.02450216

In contrast to the arsenic example with an odd number of data points, consider applying the jackknife to the median of the ten brain volume differences, from Example 5.2.1:

attach(brainpairs);jackknife(diff,median)

to give 0 as the bias correction.

The average effects of such corrections may be investigated via simulation. Table 10.2 contains the results of a simulation based on 100,000 random samples for data sets of size 11. In this exponential case, the jackknife bias correction over-corrects the median, but appears to address the trimmed mean exactly.

TABLE 10.2: Expectations of statistic and Jackknife bias estimate

Distribution

Statistic T

E [T]

E [B]

Parameter

Exponential

mean

0.998

0.000

1

Exponential

median

0.738

0.091

0.693

Exponential

0.25 trimmed mean

0.810

-0.060

0.738

Under some more restrictive conditions, one can also use this idea to estimate the variance of T.

Exercises

1. The data set

http: / / ftp.uni-bayreuth.de / math/statlib/datasets/lupus

gives data on 87 lupus patients. The fourth column gives transformed disease duration.

a. Give a 90% bootstrap confidence interval for the mean transformed disease duration, using the basic, Studentized, and BCa approaches.

b. Give a jackknife estimate of the bias of the mean and of the 0.25 trimmed mean transformed disease duration (that is, the sample average of the middle half of the transformed disease duration).

2. The data set

http://ftp.uni-

bayreuth.de/math/statlib/datasets/federalistpapers.txt

gives data from an analysis of a series of documents. The first column gives document number, the second gives the name of a text file, the third gives a group to which the text is assigned, the fourth represents a measure of the use of first person in the text, and the fifth presents a measure of inner thinking. There are other columns that you can ignore. (The version at Statlib, above, has odd line breaks. A reformatted version can be found at

stat.rutgers.edu/home/kolassa/Data/federalistpapers.txt).

a. Calculate a bootstrap confidence interval, with confidence level .95, for the regression coefficient of inner thinking regressed on first person. Test at a = .05. Provide basic, Studentized, and BCa intervals. Do the fixed-X bootstrap.

b. Calculate a bootstrap confidence interval, with confidence level .95, for the regression coefficient of inner thinking regressed on first person. Provide basic, Studentized, and BCa intervals. Do not do the fixed-X bootstrap; re-sample pairs of data.

c. Calculate a bootstrap confidence interval, with confidence level .95, for the R2 statistic for inner thinking regressed on first person. Provide basic and BCa intervals. Do not do the fixed-X bootstrap; re-sample pairs of data.

А

 
Source
< Prev   CONTENTS   Source   Next >