# Continuous Endpoints: HbAc, BMI

As diabetic patients often suffer from obesity, BMI has been a very important index for measuring diabetes progression. To evaluate the validity of the proposed network structure, we examine its ability to predict the BMI given the predictors.

Although it is a typical regression problem, predicting BMI is intrinsically not easy because of the amount of missingness in this data set and the noisy nature of EHR data. Therefore, several methods are included for comparison, including ordinary least-squares (OLS) regression, randomForest, a simple deep neural network (DNN) approach, and the proposed CNN with TF-IDF weighted text. The OLS and the randomForest are the typical regression methods used in statistics and machine learning communities. Limited by the scope of this study, we do not cover the details of these methods. The simple DNN method shares the same set of predictors with quarterly repeated measures. The only difference is that the DNN does not impose any convolutional layers on the data; rather, it uses a flattened vector of length 819 (21 x 39) as the input layer. There are 10 fully connected layers with 400 neurons for each layer. The dropout rate is the same as the CNN approach. Table 8.1 shows the performance of these methods in terms of R-square based on fivefold cross-validation.

Table 8.1 proved the difficulty of predicting BMI for diabetic patients. All methods have R-squares around 0.5. The classic DNN does not beat the randomForest, which is extremely robust in noisy settings. However, the proposed CNN approach actually enjoys a 2% improvement, which implies the trajectories of predictors' values contribute to the prediction aside from the values themselves. In addition, adding text information can get a further edge of 1%. This verified the hypothesis that the text contains extra information

TABLE 8.1

R-Squares of Various Methods

Methods |
R |

OLS |
0.46 |

Random Forest |
0.51 |

Deep Neural Network |
0.51 |

Convolutional Neural Network |
0.53 |

CNN with TF-IDF weighted Text |
0.54 |

about the patients, but this marginal improvement can hardly justify the increased training time and the complexity of the model.