Analysis of Case Studies: The Age-Related Macular Degeneration Trial

The use of the PE to assess surrogacy will be illustrated using the ARMD data. In the analyses, both S and T are considered to be continuous normally distributed endpoints so в and eS can be estimated using models (3.6) and (3.9), respectively. As was found in Section 3.2.2, f = -1.4562 (s.e. 1.1771) and fS = -0.6362 (s.e. 0.7894). Thus, PE = 1 - (-0.6362/ - 1.4562) = 0.5631 (95% delta method-based confidence limits [-0.3533, 1.2271]). The

point estimate for PE thus indicates that 56.31% of the effect of Z on T can be explained by S, but notice that the 95% confidence interval covers the entire [0, 1] interval and thus no useful information is conveyed.

An Appraisal of the Proportion Explained

As was also the case with the Prentice criteria, there are some severe issues with the PE.

First, the intuition behind the PE is that PE = 1 when all treatment effect is mediated by S (i.e., if Д5 = 0) and PE = 0 when there is no mediation (i.e., в = Ps). Unfortunately, this intuitively appealing reasoning is flawed because ps is not necessarily zero when there is full mediation, and в and ps are not necessarily equal when there is no mediation. As a result, the PE is not confined to the unit interval and it is thus not truly a proportion in the mathematical sense (i.e., it does not always holds that 0 < PE < 1).

Second, to be useful in practice, a surrogate endpoint should allow for the prediction of the effect of the treatment Z on T based on the effect of Z on S (in a future clinical trial). It is not clear how such a prediction can be made within the PE framework.

Third, the confidence interval of the PE tends to be wide. This was also the case in the analysis of the ARMD dataset, where the 95% confidence interval for PE spanned the entire [0, 1] interval and thus no useful information was conveyed. Note also that Freedman (2001) found that the ratio /3/s.e.(/3) should be > 5 (indicative of a very strong treatment effect on T) to achieve 80% power for a test of the hypothesis that S explains more than 50% of the effect of Z on T. Arguably, such a strong requirement makes the use of the PE infeasible in practice.

Fourth, the PE approach assumes that (3.9) is the correct model (when S and T are continuous normally distributed endpoints). If this assumption is not correct (e.g., if the association between S and T depends on Z), the PE ceases to have a simple interpretation and the validation process cannot be continued (Freedman et al., 1992).

Finally, Frangakis and Rubin (2002) strongly criticized the conceptual foundation of the PE (and the related fourth Prentice criterion), because the treatment effect on the T is obtained after conditioning on the postrandomization S. Consequently, it cannot be considered to be a causal effect.