# An Information-Theoretic Unification

In spirit and concepts, information theory has its mathematical roots connected with the idea of disorder or entropy used in thermodynamics and statistical mechanics. An early attempt to formalize the theory was made by Nyquist in 1924 who recognized the logarithmic nature of information (Nyquist, 1924). Claude Shannon (?), together with Warren Weaver in 1949 wrote the definitive, classic work in information theory: The Mathematical Theory of Communication. Divided into separated treatments for continuoustime and discrete-time signals, systems, and channels, this book laid out all the key concepts and relationships that define the field today.

R.A. Fisher's well-known measure of the amount of information supplied by data about an unknown parameter is the first use of information in statistics. Kullback and Leibler (1951) studied another statistical information measure involving two probability distributions, the so-called Kullback-Leibler information.

The concept of entropy lies at the center of information theory and can be interpreted as a measure of the uncertainty associated with a random variable. If Y is a discrete random variable taking values {ki, k2,..., km} with probability function P(Y = kj) = pi, then the entropy of Y is defined as

The joint and conditional entropies are defined in an analogous fashion as H(X, Y) = -Ex,y [P(X,Y)] and H(Y|X) = -Ex [Ey (P(Y|X))], with

P(x,y) and P(y|x) denoting the joint and conditional probability functions, respectively. Entropy is always non-negative and satisfies H(Y|X) < H(Y) for any pair of random variables (X, Y), with equality holding under independence. Basically, the previous inequality states that, as an average, uncertainty about Y can only decrease if additional information (X) becomes available. Furthermore, entropy is invariant under a bijective transformation.

Similarly, the so-called differential entropy hd(Y) of a continuous random variable Y with density f (y) and support SY is defined as

Differential entropy enjoys some but not all properties of entropy, it can be infinitely large, negative, or positive, and is coordinate dependent. For a bijective transformation W = v(Y), it follows that hd(W) = hd(Y) —

Ew (log|^(TW)|).

The amount of uncertainty in Y, expected to be removed if the value of X were known, is quantified by the so-called mutual information I(X, Y) =

h(Y) — h(YX), where h = H in the discrete case and h = hd for continuous random variables. It is always non-negative, zero if and only if X and Y are independent, symmetric, invariant under bijective transformations of X and Y, and I(X, X) = h(X). Moreover, mutual information approaches infinity as the distribution of (X, Y) approaches a singular distribution; that is, I(X, Y) is large if there is an approximate functional relationship among X and Y (Joe, 1989; Cover and Tomas, 1991).

Alonso and Molenberghs (2007) proposed to assess the validity of a surrogate endpoint in terms of uncertainty reduction. In fact, these authors stated that S is a valid surrogate for T at the individual (trial) level, if uncertainty about T (the expected causal treatment effect on T) is reduced by a “large” amount when S (the expected causal treatment effect on S) is known. This definition conceptualizes, in a simple yet formal way, what is intuitively expected from a valid surrogate endpoint. Indeed, often in practice, at the individual level, surrogate endpoints are used to gain information about the outcome on the true endpoint. For instance, a treating physician may measure cholesterol level to gain information about the risk of heart attack for a given patient. Similarly, at the trial level, it is expected that studying the treatment effect on the surrogate may provide information on the effect of the treatment on the true endpoint. For instance, a trialist may study the impact of a treatment on progression-free survival hoping to gain information on its impact on overall survival. In the following sections, the use of the previous definition at the trial level will be illustrated.