Increased Risk of Type I Error
Each look at the data increases the type-I error rate and may introduce bias, particularly, the bias associated with random high outcomes. Suppose, your null hypothesis of no effect is correct, and k analyses will be performed, with a=0.05 for each analysis. Then, the true significance level after k analyses will equal
with k=2, a p-value < 0.098 with k=3, a p-value < 0.143.
This is much larger than the traditional overall p-value of <0.05, The chance with k=3 would mean, that we would have an up to 14.3 % chance of finding a significant difference from zero, if there were no difference from zero. This chance is so big, that it is no longer appropriate to reject the null hypothesis of no effect. This would mean a negative study, that would have been positive at p < 0.05, if no interim analysis had been performed. The problem will get larger and larger, the more interim analyses are performed. See underneath graph.
Methods for Lowering the Type I Error (a), the Armitage/Pocock Group Sequential Method
Interim analyses can be classified as follows:
- (1) group sequential design
- • Armitage et al/Pocock method for lowering the type I error
- • alpha spending function method for lowering the type I error
- (2) continuous sequential design.
We will, first, addres the Armitage/Pocock group sequential method for lowering the type I error. The first question is, how many interim analyses do we have? A crucial point in the design phase of such studies is the question: how do we correct for the multiple testing problem. The Bonferroni procedure (see Chap. 9) is much too conservative for most interim analyses, meaning that the nominal significance level will soon be much smaller than a=0.05, and loss of power will accordingly occur. Instead, we have to lower the significance level a of 0.05 to a new significance level indicated as “ a* ”. This novel level will to be used at all interims. It is a simple method, that, generally, does the job pretty well. An important issue, here, is however, how does the a-lowering action affect the power of the trial, because lowering a also means lowering the power of the study. An example will be given of the effects of a-lowering on the power of the trial. First, let us design a clinical trial:
two equally sized parallel groups of patients
- • 6n patients in total, thus 3n per group outcome: it is a normally distributed variable
- • with known standard deviation о per group three analyses: two interims and the final analysis
- • after 2n, 4n and 6n patients have been completed.
As the test-statistic, an unpaired Student’s t-test will be applied. Consider the test-statistic at the first and the final analyses to be respectively named t1 and tt.
tt = 1/VS (ti + t2 + t3)
tt=0.58 (ti + t2+t3) tt=0.58 ti + 0.58 O2+t3)
the correlation between tt and t1 meets a linear regression model with a regression coefficient of 0.58 between tt and t1.
Check for yourself, that with three interims the regression coefficient will equal
0.5, with a single interim it will equal 0.7.
The above graph, with numbers of interim analyses on the x-axis and linear correlation coefficient (r) on the y-axis, gives an overview of the relationship between the numbers of interim analyses and the magnitude of the linear correlation coefficients between the test-statistics of the first and the final analyses. The above calculations, thus, demonstrate, that, with two interim analyses, you can predict the final analysis from your first interim analysis with only r=0.58 (58 %) certainty, with three interim analyses with only r=0.50 (50 %), and with a single interim analysis with r=0.70 (70 %) certainty.
With many interim analyses your level of certainty, rapidly, falls further. So, the first advise would be, to just perform very few interim analyses. An example is given of a clinical trial with two parallel treatment groups:
- - 3n patients per group (6n patients in total)
- - three analyses: two interim analyses+final analysis
- - tested with Student’s t-test
- - required power=80 %
- - defined overall significance level wished for: a=0.05
- - hypothesized effect size=0.4.
If the above trial would not have had any interim analysis, then the required sample size meeting the requested power can be calculated as shown underneath.
With interim analyses, that have a traditional significance level a=0.05, power reduces from 80 % to:
- - after 33 patients/group: power=37 %
- - after 66 patients/group: power=63 %.
And the total type-I error rate will rise to approximately 0.143, which is much >0.05,
- - thus a per analysis must be lowered
- - and the sample size must be increased, if power has to be maintained.
The group sequential method (Pocock) recommends, that a is adjusted to a* ~ 0.022 (at each analysis). Without further adjustment of the sample size, the power of such a trial will be:
interim analysis 1 with 33 patients: 25 %
interim analysis 2 with 66 patients: 50 %
at the final analysis with 99 patients: 70 %
If, however, we increase sample size to 123, then we will end up at the final analysis with a power of 80 %, which is generally accepted as adequate:
interim analysis 1 with 41 patients: 32 %
interim analysis 2 with 82 patients: 61 %
at the final analysis with 123 patients: 80 %.
We have to keep in mind, though, that the sample size needed to be increased by
- (123 - 99)/99=24 % patients.
You may wonder: if the sample size has to be increased, why, then, perform interim analyses in the first place? The answer is:
- - to be on the safe side,
- - for ethical reasons,
- - for economical reasons.
In order to obtain an overall significance level of <0.05, 123 patients per group will have to be included. However, in practice, the expected number of patients at the completion of the trial will, often, be less than planned due to patient loss. And, so, the overall significance level of <0.05 is likely not be obtained. This would mean, that including an additional 10 % as a safety factor is recommended.