# Statistical Analysis

We employed Cox regression analysis to assess the effect of quantitative classification on disease-free survival with adjustments for potential confounding factors [118], calculating the hazard ratio (HR) and the 95% confidence interval (CI). To estimate the probabilities of disease-free survival according to the quantitative classification, we generated disease-free survival curves using the Kaplan-Meier

Fig. 4.5 **Overview of the classification approach for prognostic prediction of lung cancer**

method with a log-rank test to confirm significance [119]. All reported P values were two sided. We defined statistical significance as *p <* 0.05. All statistical analyses were performed using R (a freely available software environment for statistical computing, version 3.0.2).

# Experiments and Results

We retrospectively identified data on 454 patients with non-small cell lung cancer (NSCLC). Each patient had undergone a preoperative thin-section CT examination under identical settings. All patients subsequently had histologically or cytologically confirmed NSCLC and had records containing information on the clinical and pathological features and any recurrence of disease. Preoperative CT scans of the entire NSCLC lesion had been acquired using multi-detector row CT scanners (Aquilion; Toshiba Medical Systems, Tochigi, Japan) at the National Cancer Center Hospital East.

We generated CT histograms from segmented nodules using a bin size of 15 HU ranging from —1000 to 500 HU. The frequency value of each histogram was normalized by the nodule volume to allow a comparison of histograms among the nodules. From CT histograms, we extracted ten quantitative features: mean and standard deviation of CT value, skewness, kurtosis, CT value at the peak of the histogram, frequency of the peak of the histogram, and the 10th, 25th, 75th, and 90th percentiles (representing the CT values yielding 10%, 25%, 75%, and 90%, respectively, of the area under the histogram from the minimum CT value). In our study of feature selection [105], we found that the use of two features, frequency of

**Fig. 4.6 ****Relationship between CT value histogram-based risk score and recurrence of the lung cancers**

the peak of the CT histogram and 90th percentile, was an appropriate combination for representing the histogram pattern.

Through applying the cluster analysis of CT histograms using the selected features, we computed a histogram-based risk score for each lung cancer. Figure 4.6 shows the relationship between histogram-based risk score and tumor recurrence. We selected an appropriate cutoff score for the expression of every pulmonary nodule using X-tile plots based on the association with the patients’ recurrence-free survival. X-tile plots provide a method to assess the association between variables and survival [120]. We generated the X-tile plots (X-tile software version 3.6.1. Yale University School of Medicine, New Haven CT, USA). We included those patients with a histogram-based risk score >340 in the group at high risk of disease recurrence (high-risk group) and those with a histogram-based risk score of <340 in the group at low risk of disease recurrence (low-risk group). Figure 4.7 shows the multivariate Cox regression analysis. This analysis result indicated that the classification (HR: 7.87; 95% CI: 1.75 - 35.37; * P =* 0.007), the pathological stage (HR: 8.39; 95% CI: 4.15 - 16.96;

*0.001), and the lymphatic permeation (HR: 2.02; 95% CI: 1.07 - 3.83;*

**P <***0.03) remained significant independent factors in disease-free survival. The disease-free survival curves for the patients with NSCLC according to the histogram-based risk score illustrate that the five-year disease-free survival probability for patients with NSCLC with high-risk scores was 67.3% (95% CI, 58.7 - 77.7). The five-year disease-free survival probability for patients with low-risk scores was 99.1% (95% CI, 97.9- 100.0). The difference in the disease-free survival rates between the two groups was also found to be significant (P < 0.001).*

**P =**306

K. Mori et al.

**Fig. 4.7 **Multivariate analysis of prognostic factors. (**a**): Hazard ratios along with 95% confidence intervals. (**b**): Multivariate analysis

# Conclusion

This subsection has described a framework within which CT value histograms are used to represent the internal structure of pulmonary nodules as a computational anatomical model, with the aim of stratifying patients into high- and low-risk groups. The approach has been illustrated using data from preoperative thin- section CT images of lung cancer. This framework provides prognostic value that complements clinicopathological risk factors and more accurately predicts recurrence for patients with early-stage lung cancers. Because the computational anatomical model based on CT value histograms extends naturally to benign cases, the framework described in this chapter might be used for other applications, such as risk stratification of pulmonary nodules detected in LDCT screening.