# Estimating R21

Finding an estimator for *1* ^{can} be essentially reduced to finding one for I(а, в). The estimation of the mutual information between two random variables is a complex problem that has received a lot of attention in the literature see, for example, Brillinger (2004), Kent (1983), and the references there in. However, there are some general results allowing the estimation of the mutual information using the likelihood function. To that end, let us consider the random variables X and *Y* with density function of *f* (x, *ув)* and suppose that the realizations (x*, *y _{i}), i* = 1, 2,... n are available. Suppose further that the parameter в has the form в = (в

_{0},

*в*and let

_{1})*в*denote the maximum likelihood estimator (mle) of в

_{0}_{0}under the null hypothesis of independence (в1 = 0). Consider the estimate

with *в* the full model maximum likelihood estimator. Notice that (9.3) is just the classical log-likelihood ratio test statistic for the hypothesis of independence divided by n, i.e., G^{2} = log(likelihood ratio test) comparing the models f (yjxj, в) and f (у*в_{0}) or, equivalently, testing H_{0} : в_{1} =0. If *в _{0} ^ в_{0}* in

probability, then, under general regularity conditions, the statistic (9.3) will tend to I(X, Y), i.e., (9.3) provides a consistent estimator for the mutual information (Kent, 1983; Brillinger, 2004). The use of (9.3) has two important advantages. First, no integrals need to be evaluated, and second, no joint models need to be considered for X and Y.

The aforementioned properties can help to consider more complex dependencies at the trial level with no additional cost. Indeed, although elegant, the hierarchical models considered by Buyse et al. (2000) often pose a considerable computational challenge (Burzykowski, Molenberghs, and Buyse, 2005). To address this problem, Tibaldi et al. (2003) suggested several simplifications, like treating the trial-specific parameters *(а _{г},в_{г})* as fixed effects in a two-stage approach. In the first stage the vectors (а,,

*вг)*are estimated within each trial using a bivariate linear model for the surrogate and true endpoints, and at the second stage, the estimated treatment effect on the true endpoint is regressed on the estimated treatment effect on the surrogate using a linear regression model. Essentially, the trial-level surrogacy Rriai is assessed by regressing

*вг*on а

_{г}.

The information-theoretic approach allows considering more general regression models at the trial-level without substantially increasing the computational burden, for instance, one could now use two general regression models *f* (в_{г}|а_{г}, в) and *f* (в_{г}|в_{0}), not necessarily linear. The parameters в and в_{0} can then be estimated using maximum likelihood and based on the estimates (а_{г}, в_{г}) obtained at the first stage of the previously described two-stage procedure. Finally, (9.3) can be used to obtain an estimate of *I*(а, в) and *Rh _{t}.*