# Survival Endpoint: MACE

We solve the parameter optimization using back-propagation, which theoretically can work for any combination of differentiable loss functions. Moving from the continuous endpoint to survival endpoint is not very difficult. The structure of the network does not need any modification except the output layer changes from least-square loss to survival loss. Here we choose the accelerated failure time (AFT) model over the Cox model as the latter only uses observations without events, which is only 4.47% of the whole population. To simplify the problem, we assume that the event time follows a Weibull distribution. In the output layer of the CNN model, the loss function is the log-likelihood of a Weibull distribution with fixed *ะบ = 2* and *X* being the linear combination of all the neurons in the output layer.

Similar to the previous section, there are linear AFT models, DNNs, CNNs, and CNNs with text information in the comparison. Randomforest is left out because it is not straightforward enough to adapt tree-based methods for survival analysis. Table 8.2 shows the performance in terms of AUC, concordance, and concordance on patients with cardiovascular events. To give a reference, the famous Framingham Risk Score only has a concordance of 0.69. Several conclusions can be drawn from the table. First, it is relatively easy to tell cardiovascular patients from non-cardiovascular patients. Second, it is more difficult to tell high-risk cardiovascular patients from low-risk cardiovascular patients. Finally, DNN/CNNs provide a more accurate prediction of whether a patient will have events by a specific time.

# Virtual Twin Method

In the previous sections, we demonstrated how a powerful machine learning framework can be constructed to predict multiple endpoints. Such a framework is itself of great importance and it helps physicians to evaluate the disease progression of new patients. However, its connection w'ith drug

TABLE 8.2

R-Squares of Various Methods

Methods |
Concordance |
Concordance on CV |
Average AUC |

Linear Cox model |
0.73 |
0.49 |
0.67 |

DNN + AFT |
0.76 |
0.49 |
0.75 |

CNN + AFT |
0.76 |
0.52 |
0.77 |

CNN + AFT + text |
0.77 |
0.54 |
0.78 |

development is less exposed. In this section, we present our solution using the virtual twin method.

Causal inference on observational data has always been challenging, especially when the data set involves extended periods, multiple sources, and potentially different scopes for data collection. However, this is the case of most EHR data sets. Therefore, we need a rigorous design so that we can avoid drawing conclusions based on spurious correlations. The framework used in this study allows us to discuss the causality of the medications and outcomes because of two features. First, the CNN model possess the ability to approximate any forms of interactions between the treatments and patients' medical conditions. Thus, it captures heterogeneities much better compared with the traditional parameter models where choosing functional form plays an overly important role in whether the model is mis-specified or not. Second, the EHR data set contains rich content of the patients and the neural network makes use of them by incorporating raw text information into the modeling. Consequently, it is generally safer to say that certain key assumptions, like ignorability assumption 1, that we are going to make in this study are more likely to hold.

Based on the fitted model, we can estimate the individual treatment effect for a certain patient by plugging in two patient matrices with the only difference being the medication record. Then the corresponding difference between the outcomes constitutes an estimate of the effect of changing medications from one to another. In this study, this interpretation relies on the ignorability assumption, which is believed to hold given the amount of information collected for each patient. Figure 8.5 shows the logic of this procedure.

FIGURE 8.5

Virtual twin for individual treatment effect.

Note that the individual treatment effect should not be interpreted as a reliable indication of what would have happened assuming the patient took another type of medication. Although the bias is controlled, there is still a chance that our estimate has a high variance due to the noisy nature of the EHR data set. Nonetheless, the estimated individual treatment effects can be used collectively, for example, one can estimate the average treatment effect (ATE) by averaging over individual treatment effects (ITEs) of all individuals, or one can estimate the ATE for a subpopulation. The tree-based methods can serve for the latter by automatically identifying the subpopulations in which the ATE is significantly higher or lower than the overall ATE for the population.

Using the virtual twin framework on the EHR data set, we study the effect of using metformin medication versus non-metformin medications, which involves, first, estimating the individual treatment effect estimated from customized CNN and, second, using the regression tree to identify the subgroups. From the pool of patients, we did learn some suggestive subgroups with effect heterogeneities, part of which are summarized in Figure 8.6. It can be inferred from the tree structure that most of the splits are associated with either demographical information, such as age, or important measurements, such as LDL, HDL, or BMI. Not surprisingly, these patterns require further verification to be used in practices.

FIGURE 8.6

Partial tree structure of effect heterogeneity.