# Two-Way Non-Parametric Analysis of Variance

Consider the same data description as in §5.1.2: one observes random variable *XkH,* with *к* € {1, • • • *,K}, l* € {1, • ■ ■ *,L},* and *i* € {I,--- ,A/a/}. Suppose the distribution of *Xku* are independent, with a distribution that is allowed to depend on *к* and *l:* that is, *Xku* ~ Fa/. We wish to test the null hypothesis that treatment has no effect, while allowing different blocks to behave differently, versus the alternative that the distribution depends on treatment as well. That is, the null hypothesis is that Fa/ does not depend on *k.* although it may depend on *L* and the alternative hypothesis is that for some *l,* some *j *and *k.* and some *x, Fki(x) ф Fji(x).* However, the test to be constructed will not have reasonable power unless this difference is more systematic; that is, consider alternatives for which for each pair of treatments *j* and *к.* either *Fji(x) < Fji(x)* for all *x* and all *l,* or *Fji(x) > Fji(x)* for all *x* and all *l.* The direction of the effect must be constant across blocks. Heuristically, treat *к* as indexing treatment and / as indexing a blocking factor.

In order to build a statistic respecting treatment order across blocks, rank the observations within blocks: that is, let *RkH* represent the rank of *X^u *within *Хц*,..., *X KIM к **1*1 and sum them separately by block and treatment.

## Distribution of Rank Sums

In order to use these ranks to detect treatment differences that are in a consistent direction across blocks, one should consolidate rank sums over blocks before comparing across treatments. Let Fa.. = ]Cf=Y Fau and

Fa.. = Fa../ *J2h=* 1 Л/а/- Consolidation via averaging, rather than via summing, implies a choice in weighting the contributions of the various blocks.

Moments of these rank sums may be calculated as an extension of (3.22). Under the null hypothesis of no treatment effect, the expectation of one rank is

and hence

Variances of rank sums depend on the covariance structure of the ranks. Ranks that make up each sum within a block are independent, but sums of ranks across blocks are dependent . Within a treatment-block cell, Var [/?/,.; ] is the same as for the Mann-Whitney-Wilcoxon statistic:

By independence across blocks,

Covariances may be calculated by comparing the variance of the sum to the sum of the variances, as in (4.12), to obtain

Since blocks are ranked independently, variances and covariances for rank sums add across blocks:

# A Generalization of the Test of Friedman

From these rank sums one might calculate separate Kruskal-Wallis statistics for each block, and, since statistics from separate blocks are independent, one might add these Kruskal-Wallis statistics to get Xl(K-i) degree of freedom statistic under the null hypothesis. Such a statistic, however, is equally sensitive to deviations from the null hypothesis in opposite directions in different blocks, and so has low power against all alternatives. Interesting alternative hypotheses involve treatment distributions in different blocks ordered in the same way.

As with the Kruskal-Wallis test, a statistic is constructed by setting *Y = *№.. - Eo №..], • • •,Rk-i.. ~ Eo №'-i..])^{T}, using (5.6), calculating Var_{0} [У] from (5.7) and (5.8), and using

as the test statistic.

## The Balanced Case

The inverse Varo [T]^{_1} is tractable in closed form under (5.1), including the special case with balanced replicates.

Friedman (1937) addresses the case in which *Мы* = 1 V/c, *l.* The generalization to the balanced case with replicates is trivial, and this section assumes *Мы* = *M* for all *к, l.* Here E = *(KM* + 1 )/2, and

Covariances between sets of rank means are

for *к ф m.* In this balanced case, the test statistic (5.9) can be shown to equal the sum of squares of ranks away from their average value per treatment group:

This demonstration is as for the Kruskal-Wallis test in §4.2.2. The limiting distribution of *Wp* is also demonstrated by observing the parallelism between this covariance structure and the covariance structure of rank sums in §4.2.2, since the covariance structure of *Y* = *(R..,*..., Rk-i..)//K has the same form as (4.12), for some vector *v* and constant *w.* A relationship like that of §4.2.2 can then be used to demonstrate that *Wp* is approximately Xk-i • Again, some condition on the sample sizes being large enough to ensure that the rank averages are approximately multivariate Gaussian is required; for example, in the case with the same number of replicates per treatment-block combination, having the number of blocks going to infinity suffices, and in the balanced case with a fixed number of blocks, having the number of observations in each block-treatment combination go to infinity suffices. This requirement is in addition to (5.1) needed to find a generalized inverse for the variance matrix in closed form, and hence to exhibit the test statistic as a sum of squares.

Example 5.4.1 *Friedman (1937) provides data on variability of expenses in various categories for families of different income levels. This data is provided in Table 5.2, and at*

*http://stat.rutgers.edu/home/kolassa/Data/friedman.dat .*

*This may be read into R using*

expensesdc-as.data.frame(scan("friedman.dat",

what=list(cat="",gl=0,g2=0,g3=0,g4=0,g5=0,g6=0,g7=0)))

*Here L =* 14, *A1 =* 1, *and К* = 7. *Ranking within groups (for example, in R, using*

expenserank<-t(apply(as.matrix(expensesd[,-1]),1.rank)) rownames(expenserank)<-expensesd[[1] ]

*gives the table of ranks*

*Rank sums are* 23,36,53,57,70,70,83, *and group means (for example, via*

apply(expenserank,2,mean)

*gives* 1.643,2.571,3.786,4.071,5.000,5.000,5.929. *Subtracting the overall mean, 4, squaring, and adding, gives 13.367. Multiplying by* 12*L/(K(K +*

1)) = 12 x 14/(7x 8) = 3 *gives the statistic value Wp* = 40.10. *Comparing to the Xo distribution gives a p-value of* 4.35 x 10^{-7}. *This might also have been done entirely in R using*

friedman.test(as.matrix(expensesd[,-1]))

The development of Friedman’s test in the two-way layout remarkably preceded analogous testing in one-way layouts. When *К =* 2 and *Мы* are all 1, Friedman’s test is equivalent to the sign test applied to differences within a block.

## The Unbalanced Case

This analysis fails in the unbalanced case, because no expression for the variance as simple as (4.12) holds, and so the variance matrix cannot be inverted in closed form. Benard and Elteren (1953) present a numerical method that uses the rule of Cramer to produce the matrix product involved in *T.* PrenTABLE 5.2: Expense variability by income group, from Friedman (1937)

Category |
Gr. 1 |
Gr. 2 |
Gr. 3 |
Gr. 4 |
Gr. 5 |
Gr. 6 |
Gr. 7 |

Housing |
103.30 |
68.42 |
89.53 |
77.94 |
100.00 |
108.20 |
184.90 |

Operations |
42.19 |
44.31 |
60.91 |
73.90 |
43.87 |
61.74 |
102.30 |

Food |
71.27 |
81.88 |
100.71 |
86.52 |
100.30 |
90.75 |
100.60 |

Clothing |
37.59 |
60.05 |
56.97 |
60.79 |
71.82 |
83.04 |
117.10 |

Furnishings |
58.31 |
52.73 |
96.04 |
60.42 |
104.33 |
89.78 |
85.77 |

Transport |
46.27 |
82.18 |
129.80 |
181.00 |
172.33 |
164.80 |
246.80 |

Recreation |
19.00 |
23.07 |
38.70 |
45.81 |
59.03 |
50.69 |
55.18 |

Personal |
8.31 |
8.43 |
9.16 |
14.28 |
10.63 |
15.84 |
12.50 |

Medical |
20.15 |
33.48 |
60.08 |
69.35 |
114.34 |
45.28 |
101.60 |

Education |
3.16 |
4.12 |
12.73 |
18.95 |
8.89 |
41.52 |
66.33 |

Community |
4.12 |
18.87 |
8.54 |
12.92 |
25.30 |
19.85 |
16.76 |

Vocation |
7.68 |
11.18 |
10.44 |
10.95 |
10.54 |
13.96 |
14.39 |

Gifts |
5.29 |
10.91 |
11.22 |
25.26 |
42.25 |
48.80 |
69.38 |

Other |
6.00 |
5.57 |
22.23 |
2.45 |
6.24 |
1.00 |
4.00 |

tice (1979) performs these calculations in somewhat more generality, and the general unbalanced two-way test is generally called the Prentice test. Skillings and Mack (1981) address this question using explicit numerical matrix inversion.

Example 5.4.2 *Consider data on weight gain among chickens, given by Cox and Snell (1981, Example K), and at*

*http://stat.rutgers.edu/home/kolassa/Data/chicken.dat .*

*Weight gain after 16 weeks, protein level, protein source, and fish soluble level. The dependence of weight gain on protein source might be graphically depicted using box plots (Figure 5.2):*

templ<-temp<-as.data.frame(scan("chicken.dat",what= list(source="", lev=0,fish=0,weight=0,w2=0))) templ$weight<-templ$w2;temp$h<-0;templ$h<-l temp$w2<-NULL;templ$w2<-NULL chickenc-rbind(temp,tempi) attach(chicken)

boxplot(split(weight,source),horizontal=TRUE,ylab="g.", main="Weight gain for Chickens", xlab="Protein Source") detach(chicken)

*Hence blocking on source is justified. Test for dependence of weight gain on protein level, blocking on source. Perform these calculations in R using*

library(muStat)#Need for Prentice test.

attach(chicken)

prentice.test(weight,source,blocks=lev) detach(chicken)

*to obtain the p-value 0.1336.*

FIGURE 5.2: Weight Gain for Chickens

# Multiple Comparisons and Scoring

One might adjust for multiple comparisons in the presence of blocking in the same way as was done without blocking, as discussed in §4.5. That is, one might use the variances and covariances to show that under the null hypothesis, *(ft, — Rj *)/*JK(LK* + l)/6 is approximately Gaussian, and employ the method of Bonferroni, Fisher’s LSD, or Tukey’s HSD.

One might also use scoring methods, including normal and Savage scores, as before. Recall also that treating ranks as general scores, with tied observations given average ranks, makes tie handling automatic.

One might also use scores that are data values. In this case, the test statistic is similar to that of the Gaussian-theory *F* test, but the reference distribution is generated from all possible permutations of the data within blocks and among treatments.

Example 5.5.1 *Refer again to Example 5.4-2. Test dependence of weight gain on protein source, blocking on level. There are eight observations at each protein level. Cycle through all* (®) = 34300 *ways to associate source labels within block, and compute the between sum of squares each time. The p-value is the number of rearrangements having this statistic meet or exceed the observed.*

library(MultNonParam)

#aov.P requires data sorted by block. Put block ends as the #third argument.

chickenc-chicken[order(chickenllev),]

aov.P(chicken$weight,as.numeric(as.factor(chicken$source)), c(8,16,24))

*The p-value is 0.182.*

# Tests for a Putative Ordering in Two-Way Layouts

Recall Friedman’s statistic of (5.9), *Wp* oc [Rfc.. ^{—} 1/2*(MK* + 1 )]^{2}, where

*Rk* are mean rankings for treatment *к* when observations are ranked within blocks. The statistic *Wp* treats deviation of any treatment in any direction similarly. If one wants to look specifically for alternatives suspected *a priori *of being ordered according to the index *k,* in the balanced case use instead

This test was proposed by Page (1963) in the balanced case with one replicate per block-treatment pair, and is called Page’s test.

For more general replication patterns, the null expectation and variance for the statistic may be calculated, using (5.6), (5.7), and (5.8), and

In the balanced case, with *Ми = M /k,l*, moments simplify to
using (3.20).

Example 5.6.1 *Refer again to the expense variability data of Example **5**.**4**.I. Test for an ordered effect of income, treating categories as blocks. Rank sums are* 23,36,53,57,70,70,83. *and the statistic is* 23 x 1 + 36x2 + 53x3 + 57x4 + 70x5 + 70x6 + 83x7 = 1833, *with expected value 1568 and variance under the null hypothesis 1829.333. The two- sided p-value is* 2.90 x 10^{-10}. *Page’s test may be performed in R using*

library(crank); page.trend.test(expensesd[,-l].FALSE)

Page (1963) presents only the case with *Мы* all 1, and provides a table of the distribution of *T _{L}* for small values of

*К*and

*L.*For larger values of

*К*and

*L*than appear in the table, Page (1963) suggests a Gaussian approximation.

The scores in (5.10) are equally spaced. This is often a reasonable choice in practice. When the *Мы* are not all the same, imbalance among numbers of ranks summed may change the interpretation of these scores, and a preferred statistic definition to replace (5.10) is

with moments given by and

Example 5.6.2 *Refer again to the chicken weight gain data of Example 5.4-2. This data set is balanced, but one could ignore the balance and numerically invert the variance matrix of all rank sums except the last. In this case, test for an ordered effect of protein level on weight gain, blocking on protein source. Perform the test using*

library(MultNonParam) attach(chicken)

cat(’ Page test with replicates ’) page.test.unbalanced(weight,lev,source) detach(chicken)

*In this balanced case, rank mean expectations are all* 6.5, *variances are *1.083, *covariances are* —0.542. *The rank means by treatment level are *7.625,7.250,4.625, *giving an overall statistic value of 36, compared with a null expectation of 39 and null variance of 3.25; the p-value is 0.096. Do not reject the null hypothesis.*

# Exercises

1. The data set

http://ftp.uni-bayreuth.de/math/statlib/datasets/schizo

reflects an experiment using measurements used to detect schizophrenia, in both schizophrenic and non-schizophrenic patients. Various tests are given. Pick those data points with CS in the second column and compare the first and second gain rations, in the third and fourth columns, respectively. For this question, select only non-schizophrenic patients.

a. Test the null hypothesis that first and second gain ratios (in the third and fourth columns, respectively) have the same median using the sign test.

b. Test the null hypothesis that first and second gain ratios (in the third and fourth columns, respectively) have the same median using the signed rank test.

2. The data set

http://ftp.uni-bayreuth.de/math/statlib/datasets/schizo

contains data from an experiment using measurements used to detect schizophrenia, on non-schizophrenic patients. Various tests are given. Pick those data points with CS in the second column and compare the first and second gain ratios, in the third and fourth columns, by taking their difference. Test the null hypothesis of zero median difference using the Wilcoxon signed rank test, for the nonschizophrenic patients.

3. Calculate the asymptotic relative efficiency for the Wilcoxon signed rank statistic relative to the one-sample t-test (which you should approximate using the one-sample z-test). Do this for observations from

a. the distribution uniform on the interval [0 — 1/2,0 + 1/2],

b. the logistic distribution, symmetric about *0,* with variance 7r^{2}/3 and density схр(ж — (9)/( 1 + ехр(ж — *в)) ^{2},*

c. and the standard normal distribution, with variance 1 and expectation *в.*

4. The data set at

http: / /stat.rutgers.edu/home/kolassa/Data/yarn.dat

represents strengths of yarn of two types from six bobbins. This file has three fields: strength, bobbin, and type. Apply the balanced variant of Friedman’s test to determine whether strength varies by type, controlling for bobbin.

5. The data at

http://lib.stat.cmu.edu/datasets/CPS_85_Wages

reflects wages from 1985. The first 42 lines of this file contain a description of the data set, and an explanation of variables; delete these lines first, or skip them when you read the file. All fields are numeric. The tenth field is sector, the fifth field is union membership, and the sixth field is hourly wage; you can skip everything else. Test for a difference in wage between union members and nonmembers, blocking on sector.