# Analysis of Case Studies: The Age-Related Macular Degeneration Trial

The use of the *PE* to assess surrogacy will be illustrated using the ARMD data. In the analyses, both *S* and *T* are considered to be continuous normally distributed endpoints so *в* and e_{S} can be estimated using models (3.6) and (3.9), respectively. As was found in Section 3.2.2, f = -1.4562 (s.e. 1.1771) and fS = -0.6362 (s.e. 0.7894). Thus, PE = 1 - (-0.6362/ - 1.4562) = 0.5631 (95% delta method-based confidence limits [-0.3533, 1.2271]). The

point estimate for *PE* thus indicates that 56.31% of the effect of *Z* on *T* can be explained by *S,* but notice that the 95% confidence interval covers the entire [0, 1] interval and thus no useful information is conveyed.

# An Appraisal of the Proportion Explained

As was also the case with the Prentice criteria, there are some severe issues with the *PE*.

First, the intuition behind the *PE* is that *PE* = 1 when all treatment effect is mediated by S (i.e., if Д5 = 0) and PE = 0 when there is no mediation (i.e., *в* = *Ps*). Unfortunately, this intuitively appealing reasoning is flawed because p_{s} is not necessarily zero when there is full mediation, and *в* and p_{s }are not necessarily equal when there is no mediation. As a result, the *PE* is not confined to the unit interval and it is thus *not* truly a proportion in the mathematical sense (i.e., it does not always holds that 0 < *PE <* 1).

Second, to be useful in practice, a surrogate endpoint should allow for the prediction of the effect of the treatment *Z* on *T* based on the effect of *Z* on *S *(in a future clinical trial). It is not clear how such a prediction can be made within the *PE* framework.

Third, the confidence interval of the *PE* tends to be wide. This was also the case in the analysis of the ARMD dataset, where the 95% confidence interval for *PE* spanned the entire [0, 1] interval and thus no useful information was conveyed. Note also that Freedman (2001) found that the ratio */3/s.e.(/3) *should be > 5 (indicative of a very strong treatment effect on T) to achieve 80% power for a test of the hypothesis that S explains more than 50% of the effect of *Z* on *T*. Arguably, such a strong requirement makes the use of the *PE* infeasible in practice.

Fourth, the *PE* approach assumes that (3.9) is the correct model (when S and *T* are continuous normally distributed endpoints). If this assumption is not correct (e.g., if the association between S and T depends on Z), the *PE *ceases to have a simple interpretation and the validation process cannot be continued (Freedman et al., 1992).

Finally, Frangakis and Rubin (2002) strongly criticized the conceptual foundation of the *PE* (and the related fourth Prentice criterion), because the treatment effect on the *T* is obtained *after* conditioning on the postrandomization S. Consequently, it cannot be considered to be a causal effect.