# Analysis of Case Studies

In this section, we illustrate the application of the methods, mentioned in the previous section, by using data from the individual-patient-data metaanalysis of patients with advanced or recurrent gastric cancer (Section 2.2.6). The analysis includes individual data on 4,069 patients with documented OS and PFS who were randomized in 20 trials (Paoletti et al., 2013). The goal of the analysis is to evaluate PFS as a surrogate for OS.

## Using SAS

### Copula-Based Models

We first conduct the analysis applying the two-stage approach with a copula model (5.1) used in the first stage. In particular, we consider models defined by using the Clayton (5.4), Hougaard (5.5), and Plackett (5.6)-(5.7) copulas. The marginal models (5.2)-(5.3) are specified by assuming the Weibull hazard functions.

We fit the three models with the help of the SAS macros (°/oCOPULA, Section 12.3.4). The obtained maximum log-likelihood values are equal to -2989.5, -2438.8, and -2549.4 for the Clayton, Hougaard, and Plackett copula, respectively. Given that all models include the same number of parameters (121), we can select as the best fitting model the one with the largest maximum-likelihood value, i.e., the model based on the Hougaard copula.

For this model, the estimated value of the copula parameter *в* is equal to 0.328 (95% CI: [0.317, 0.339]). From (5.11) it follows that the value of т can be estimated to be equal to 0.672 (95% CI: [0.661, 0.683]). Spearman’s rank correlation coefficient, computed from (5.9) by numerical integration, is equal to 0.853 (95% CI: [0.842, 0.865]). It indicates substantial correlation between PFS and OS at the individual-patient level.

Figure 5.1 presents the estimated trial-specific treatment effects, i.e., logarithms of hazard ratios, on PFS and on OS; each trial is represented by a circle, the size of which is proportional to the trial sample size. The association between the effects is only moderate. A simple linear regression model, fitted without any adjustment for the estimation error present in the estimated treatment effects, yields the following regression equation:

with the standard errors of the intercept and slope estimated to be equal to 0.050 and 0.138, respectively. The corresponding value of Rriai(r) is equal to 0.621 (95% CI: [0.359, 0.883]). The value is based on the following estimate of the variance-covariance matrix D of the (random) treatment effects:

for which the condition number (the ratio of the largest to the smallest eigenvalue) is equal to 8.5. This value is small enough to regard the obtained estimate as numerically stable.

To adjust the analysis for the estimation of treatment effects by using model (5.13)-(5.14), we need data in the “long” format, with two records per trial, one providing the estimate of effect for PFS, and the other for OS. An illustration of a few first records in a such dataset is:

trial effect endp

- 1 -0.09382 MAIN
- 1 -0.46722 SURR
- 2 0.03852 MAIN
- 2 -0.16112 SURR

FIGURE 5.1

*Advanced Gastric Cancer Data. Trial-level association between copula-model- based treatment effects on PFS and OS (both axes are on a log scale). The circle surfaces are proportional to trial size.*

- 3 -0.32230 MAIN
- 3 -0.29295 SURR

To fit model (5.13)-(5.14), we can use PROC MIXED:

PROC MIXED data=both order=data method=reml covtest asycov;

MODEL effect=surr main/s covb noint;

RANDOM surr main/subject=trial type=un;

REPEATED /subject=trial group=trial type=un;

PARMS /parmsdata=parms eqcons=4 to &nobs;

RUN;

Note the use of the PARMS statement. It uses the dataset indicated in the parmsdata option to provide the starting values for the variance-covariance parameters of the model. In particular, the values starting from the fourth one are fixed, as indicated by the eqcons=4 to &nobs option, where &nobs is a SAS macro variable indicating the number of observations in the parmsdata dataset. The macro variable has to be created and initialized or replaced by the concrete number for the data at hand, which is the 3 x (#trials +1). The

first three values in the parmsdata dataset become the starting values for the estimation of matrix *D.*

Thus, the parmsdata dataset should include just one variable called Est containing the elements of matrices D and the estimated variance-covariance matrices Qj for all trials. The first three values should be the starting values for the elements of D, given in the order *d _{bb}, d*

_{ab}, and

*d*The fourth and subsequent values should be the values of w

_{aa}._{bbjj}, w

_{abjj}, and w

_{aajj}(see equation (5.14)) for all trials, estimated by using the first-stage model. Note that the selection of the starting values for D is very important, as it may influence the convergence of the optimization algorithm. However, even with reasonable starting values, the optimization algorithm is not guaranteed to converge, especially if the magnitude of the estimation error (elements of Qj) is considerable as compared to the between-trial variability (elements of D).

Note that alternative variance-covariance parameterizations could be used in the type option of the RANDOM statement. In particular, one can use type=unr or type=fa0(2) options. The former has the advantage of providing

directly the estimate of i?_{tll}-iai(r) = _{a}i(_{r}) with its standard error, while the

latter explicitly constraints the estimates of D to be positive-definite.

After adjusting for the estimation error by using model (5.13)-(5.14) fitted with the PROC MIXED code described above, Rriai(r) is estimated to be equal to 0.606 (95% CI: [0.041, 1.170]). The value is based on the following estimate of the variance-covariance matrix D of the (random) treatment effects:

for which the condition number is equal to 8.0. It is worth noting that the elements of matrix (5.17) are about twice smaller than the elements of matrix (5.16). This is due to the fact that (5.16) is obtained by considering the total variability of the estimated treatment-effects as due to the between-trial variability, whereas (5.17) is obtained after removing from the total variability the part due to the estimation of treatment effects.

Note that the CI for the “adjusted” R_{rial}(_{r}) is much wider than the CI obtained for the “naive” regression model (5.15). Thus, after accounting for the fact that we analyze estimated, and not the true random, treatment effects, our uncertainty regarding the true value of R_{r}i_{a}i(_{r}) increases.

The linear regression model obtained with an adjustment for estimation errors is

with the standard errors of the intercept and slope estimated to be equal to 0.079 and 0.295, respectively. There is only a slight difference between the estimated values of the intercept and slope as compared to the “naive” regression (5.15); this is not always the case. The standard errors of the coefficients of the equation (5.18) are larger than the corresponding estimates for (5.15).

Regression line (5.18) is labeled “predicted” in Figure 5.1. The 95% prediction limits, presented in the plot, indicate the range of effects on OS that can be expected for a given effect on PFS.

The moderate correlation at the trial level is reflected by a surrogate threshold effect (STE; Section 4.5) equal to 0.56 (indicated by the vertical dashed line in Figure 5.1). Hence, one should observe an *HRpps* smaller than 0.56 in order to predict, with 95% confidence, an HR_{OS} significantly smaller than 1.