# An Appraisal of Prentice’s Approach

The Prentice criteria are intuitively appealing and straightforward to test, but there are some fundamental problems that surround this approach.

First, the fourth Prentice criterion (3.4) requires that the statistical test for the e_{S} parameter is non-significant (see (3.9) for the case where both *S *and *T* are continuous normally distributed endpoints). This criterion is useful to reject a poor surrogate endpoint (i.e., a surrogate for which *fi _{S}* = 0), but it is not suitable to validate a good surrogate (i.e., a surrogate for which

*fi*= 0). Indeed, validating a good surrogate in the Prentice framework would require the

_{S}*acceptance*of the null hypothesis H

_{0}: Дэ = 0, which is obviously not possible (Freedman et al., 1992). For example, the non-significant hypothesis test may always be the result of a lack of statistical power due to an insufficient number of patients in the trial.

Second, even when lack of statistical power would not be an issue, the result of the statistical test to evaluate the fourth Prentice criterion (i.e., H_{0} : Д5 = 0) cannot prove that the effect of the treatment *Z* on T is *fully *captured by S (Burzykowski, Molenberghs, and Buyse, 2005; Frangakis and Rubin, 2002). Moreover, in any practical setting, it would be more realistic to expect that a surrogate explains part of the treatment effect on the true endpoint, rather than the full effect. These considerations led Freedman et al. (1992) to the proposal that attention should be shifted from the hypothesistesting framework of Prentice (1989), i.e., a yes/no all-or-nothing qualitative judgment of the appropriateness of a candidate S, to an estimation framework, i.e., a quantitative rating of the appropriateness of a candidate S. Their proposal (the so-called Proportion Explained) is detailed in Section 3.3.

Third, it can be shown that Prentice’s operational criteria to validate a candidate S are only equivalent to his definition of a surrogate when both S and T are binary variables (in the latter case, models (3.5)-(3.6), (3.8), and

(3.9) are replaced by their logistic regression counterparts). This implies that verifying the operational criteria does not guarantee that the surrogate truly fulfills the definition, except when all members of the triplet (Z, T, S) are binary. For details, the reader is referred to Buyse and Molenberghs (1998).

Fourth, a candidate S can only be validated when the treatment Z significantly affects both S and T (see (3.1)-(3.2)). Thus, the data of a clinical trial in which the treatment has no significant effect on S and T (which was the case for the ARMD trial, see Section 3.2.2) cannot be used to validate a surrogate endpoint in Prentice’s approach.