Null Hypothesis Testing with Multiple Outcome Variables
The risk of a type-I error of finding a difference from a zero effect, where there is none, is fixed by convention at 0.05 (5 %). Assume in a study, that the null-hypothesis of no-difference is true. Suppose in this study, k comparisons/tests, instead of a single one, were performed. For each comparison the type I error is 0.05. Then, with multiple endpoints in a single study, you would increase your type I error rapidly:
- - with k = 1, a = 0.05
- - with k=2, a = 0.10
- - with k=3, a=0.15, etcetera.
Mathematically slightly more precise than the above computations is the underneath computation based on Boole’s inequality: the risk of at least one statistical test with p-value <0.05:
- - Pr (probability) <1 - (1 - a)k = 1-0.95k
- - with k = 1, a = 0.05
- - with k=2, a = 0.0975
- - with k = 3, a = 0.143.
Consider two independent hypothesis (H0s) tests:
suppose both H0s are correct
Pr (probability) (both decisions are wrong) = a * a
Pr (one decision is wrong) = 2 * a * (1-a)
Pr (both decisions are correct) = (1-a) * (1-a)
Pr (at least one decision wrong) = 1 - (1-a)2
= 0.098 if a=0.05.
Now, if you consider to test many more H0s in one and the same study, then the underneath pattern of increased overall type I errors will be in the study, which is, of course, an absolutely unacceptable situation.

Procedures like the underneath ones would be adequate to solve the problem of huge type I errors. Suppose a study includes a family of H0s. A familywise type I error a* = 0.05. A simple method to find the appropriate alphas of your separate endpoint tests is:

However, the above method assumes, that the multiple tests are entirely independent of one another. In practice, this is, virtually, never so. If multiple tests in a study are dependent on one another, mostly there is a positive correlation in the outcomes. This causes the problem of multiple testing to be less dramatic as illustrated underneath. With a zero correlation, however, two tests are entirely independent of one another. As illustrated below, already with two outcomes a rejection alpha will equal 0.098, close to 0.10. The type-I error is, thus, almost doubled with two statistical tests in one study. If correlation between the two tests is strong, either positive or negative, then the rejection alpha will remain close to 0.05, despite two tests. This is, because the 2nd test, with a correlation of 100 % with the first one, can be predicted from the first one with 100 % certainty.

All of the above reasonings indicate, that we need a more clever multiple testing method, meaning more clever ways of preventing a from increasing above its by convention fixed level of 0.05 (5 %). Nowadays, often, the “closure principle” is used for the purpose. The closure principle is a belief-system telling you, that, if someone knows everything about p, and if q is a consequence of p, then he is likely to also know everything about q.