# Information-Theoretic Approach: Trial Level

As in Chapter 4, let us assume that data from *i =* 1,..., N clinical trials are available, in the ith of which j = 1,..., n* subjects are enrolled, and let в* and *а _{г}* denote the trial-specific expected causal treatment effects on the true and surrogate endpoints, respectively. Furthermore, let us assume that (а

_{г}, в*) follows a distribution characterized by the density function

*f (а,*в).

The mutual information between both expected causal treatment effects I*(а, в)* quantifies the amount of uncertainty in в, expected to be removed if the value of а becomes known and hence, it seems sensible to use this measure to quantitatively assess the previous definition of surrogacy. However, the absence of an upper bound for I(а, в) hinders its interpretation. To solve this problem, Alonso and Molenberghs (2007) proposed to use instead a normalized version of the mutual information, the so-called squared informational coefficient of correlation (SICC) introduced by Linfoot (1957) and Joe (1989):

If *f(a,/3)* is a bivariate normal distribution, then *I(a,/3) =* — llog(l — p^{2}/3) where p_{a}y *=* corr(a, в) and, therefore, Rh* _{t} =* R

_{ria}i in this scenario. The SICC is always in the interval [0,1], is invariant under bijective transformations, and takes value zero if and only if

*a*and

*в*are independent. As previously mentioned, mutual information approaches infinity when the distribution of (a, в) approaches a singular distribution, i.e., Rh

_{t}« 1 if and only if there exists an approximate functional relationship among

*a*and в (Joe, 1989). In addition, the randomness of в can be defined using the so-called entropy- power:

The previous definition is motivated by the functional form of the normal distribution. Indeed, the differential entropy of a continuous normal random variable A" is *h(X) =* log *(2тт*ест^{2}) and, thus, for the normal distribution the differential entropy is just a function of the variance. Measuring information in nats (using the natural logarithm) leads to EP(X) = *a ^{2},* i.e., the larger the variability, the larger the uncertainty or “randomness.” Although valid for the normal distribution, the previous equivalence between variability and uncertainty does not hold in other scenarios.

The residual randomness of в given *a* can be quantified using Е? (в^), obtained by substituting Л.(в) by *^в^)* in (9.1) and Rh_{t} can then be interpreted as the proportion of the randomness in в, which is explained by a (Kent, 1983):