In this study, an atlas ranking and selection approach to be used together with multi-atlas segmentation for the automatic delineation of organs-at-risk from CT scans was evaluated. This approach selects a subset of optimal atlas candidates from an atlas pool for multi-atlas segmentation on the basis of local anatomy, thus potentially reducing the adverse impact of inter-subject variability on multi-atlas segmentation and improving the segmentation accuracy. A STAPLE-based contour fusion approach by including the previous image was used to improve the fusion performance. This multi-atlas segmentation system was evaluated on head and neck cancer patients for esophagus segmentation and thoracic cancer patients for segmentation of heart, lungs spinal cord, and esophagus from a public benchmark dataset. The findings may have a positive impact on contouring for radiation treatment planning by saving clinicians’ time and improving contouring efficiency and consistency [20-22].
In the implementation, the segmentation was run on a Windows 7-based PC with an 8-core Intel Core i7 3.4-GHz CPU and 8 GB of memory. The deformable registration was performed independently on a Windows server with an 8-core 3-GHz Intel Xeon CPU and 8 GB of memory. Multithread computing was enabled in the deformable registration algorithm, and two registration tasks were allowed to be run simultaneously on the server. Each segmentation task required around five minutes to complete in a thoracic cancer patient and around three minutes in a head and neck cancer patient, with about half the time spent on deformable registration.
The study showed that online atlas selection improved overall multi-atlas segmentation. The improvement in several poor cases was significant. For example, the MAS-AS results for patients 2, 6, 9, 10, and 11 in the head and neck cancer dataset showed significant improvement over the MAS results, as shown in Table 5.2. However, using online atlas selection did not result in improvement in every case. For example, the MAS-AS results for patients 4 and 8 were slightly inferior to the MAS results (Table 5.2). Nevertheless, online atlas selection improved the robustness of multi-atlas segmentation by preventing a worst-case scenario.
The cases showing inferior results for MAS-AS indicate that the atlas selection was not perfect. This was mostly due to the imperfect similarity metrics used to rank the atlases. For example, the cross-correlation coefficient used in the first phase of atlas selection is shift- and rotation-variant and is thus sensitive to misalignment. In addition, in rigid alignment, the rotation was not corrected. The rotation variance may play down those atlases having similar local anatomy but different scanning positions, such as different neck flexion in the scanning setup for head and neck cancer patients.
In the second phase of atlas selection, the registration error may be a major obstacle for atlas selection. The histogram-based KL divergence metric was used to reduce the impact from the imperfect deformable registration. The histogram was created from a local region for calculating the KL divergence. Choosing the proper size of this region is critical to obtaining a correct atlas ranking. A larger region may better counteract the registration error but reduce atlas selection efficacy because unrelated anatomy may be included. In the study, it was found that the local region, as defined by the union of the deformed contours, expanded by 1 mm, gave the best results in most cases, but it is acknowledged that using a different size for the local region might result in better atlas selection in different cases.
The study also found that the presence of air bubbles is one of the major obstacles in atlas-based segmentation of the esophagus. Several poor head and neck cancer cases, such as patients 5 and 10, were caused by the presence of air bubbles. One potential solution to this issue is to detect air bubbles inside the esophagus . Once the air bubbles are detected, they can be replaced with similar intensity values of esophagus in the CT image; the modified CT image will be used for multi-atlas segmentation. On the other hand, it was noticed that an air bubble can significantly change the similarity between the atlas and the test image in the atlas selection process. Performing atlas selection in local segments of the esophagus may resolve the air bubble issue. In addition, a similarity comparison using the entire long winding esophagus is not always locally accurate in selecting the atlas.
FIGURE 5.8 Comparison of the Dice similarity coefficient between different methods on the 2017 AAPM Thoracic Auto-segmentation Challenge benchmark dataset for the esophagus, spinal cord, lung, and heart. Red dot shows the results of MAS-AS on the whole dataset (24 cases), blue dots show the result of other methods on the live dataset (12 cases), and green dots show the variability of human operators on three select cases. Numbers are taken from Yang et al. .
In thoracic cancer patients, delineating the esophagus segment near the heart is most challenging for atlas-based segmentation. Neighboring structures such as the heart, lungs, and aorta may push or pull the esophagus, resulting in different shapes and locations from day to day and from patient to patient. In addition, the low contrast between the esophagus and the surrounding tissues makes it difficult to perform auto-segmentation. The difficulty of delineating the esophagus was further verified on the thoracic benchmark dataset. Compared to the lungs, heart, and spinal cord, the performance of MAS-AS resulted in worse DSC and MSD metrics. While automatic segmentation methods could perform close to inter-observer variability for the lungs, heart, and spinal cord, it seems that more work is needed in developing these methods to successfully segment the esophagus. One possible improvement for the MAS-AS is using nearby structures as a constraint for segmentation to assist esophagus segmentation [24, 25].
Although the atlas selection scheme presented here represents an improvement over traditional MAS schemes, comparison of MAS-AS with other available schemes on the thoracic benchmark dataset shows that MAS-AS did not perform better but similarly to the current state-of-the-art approaches. Other methods evaluating the thoracic benchmark dataset, which consisted of a combination of deep learning and other atlas-based methods, had a mean DSC ranging from 0.55 to 0.72 and a mean MSD ranging from 2.03 mm to 13.10 mm for the esophagus . The DSC and MSD from the MAS-AS procedure fall within these ranges (Figures 5.8 and 5.9). Of note, the statistics
FIGURE 5.9 Comparison of the mean surface distance between different methods on the 2017 AAPM Thoracic Auto-segmentation Challenge benchmark dataset for the esophagus, spinal cord, lung, and heart. The diamond shows the results of M AS-AS on the whole dataset (24 cases), the circles show the result of other methods on the live dataset (12 cases), and the triangles show the variability of human operators on three select cases. Numbers are taken from Yang et al. .
from the other methods are from the online test cases, which only included 12 of the 24 cases in this analysis as offline test cases were not included. Even if the above analysis for MAS-AS is restricted to only the 12 online test cases, the conclusions remain the same. Another important comparison is with the inter-rater reference. Multiple experts contoured the three cases in the benchmark dataset and DSC and MSD were computed as the inter-rater reference. Relative to the ground truth, the mean DSC and MSD from three different raters on the three cases were 0.82 ± 0.04 and
1.07 mm ± 0.25 mm . The MAS-AS approach, as well as the other methods participating in the 2017 AAPM Thoracic Auto-segmentation Challenge, performed inferiorly compared to interobserver performance. This suggests that progress in MAS-AS and other auto-contouring methods are needed before these methods can perform equally as well as experts for esophagus delineation.
Similar findings regarding the effectiveness of MAS-AS on segmentation of the lung, heart, and spinal cord were also observed. The main difference in segmentation of these organs and the esophagus was that automatic segmentation methods compared more favorably to the interobserver performance and occasionally generated even better metrics. The mean MSD and DSC for MAS-AS were within the range of values found from other methods with the exception that the DSC for MAS-AS was 0.01 lower than any value for the left lung and 0.02 lower than any other value for the spinal cord. However, these differences are small and the mean DSC of MAS-AS for these two organs was usually within one standard deviation of other methods . Additionally, the DSC values generated by MAS-AS are comparable with those reported in literature for these organs-at-risk [27-30]. The mean inter-observer DSC was 0.96 ± 0.02 for both the left and right lung, 0.93 ± 0.02 for the heart, and 0.86 ± 0.04 for the spinal cord. In respect to MSD, the interobserver performance was 1.51 mm ± 0.67 mm for the right lung, 1.87 mm ± 0.87 mm for the left lung, 2.21 mm ± 0.59 mm for the heart, and 0.88 mm ± 0.23 mm for the spinal cord . For both the left and right lungs, the measured DSC and MSD of the MAS-AS method was within one standard deviation of the inter-observer performance. For the heart, the DSC was within two standard deviations of the inter-observer performance but the MSD was not. For the spinal cord, the DSC was within two standard deviations of inter-observer performance and the MSD was within one standard deviation. The above indicates that MAS-AS may generate contours comparable to experts but they are still inferior as an aggregate. One explanation is that while MAS-AS often generates acceptable contours, it is possible to make errors in some cases. One piece of data supporting this is that the median evaluation was superior to the mean for both DSC and MSD for every organ on the benchmark dataset.
Although the results of MAS-AS varied slightly between the three sites from which the benchmark dataset was collected, differences between the sites did not serve as the major determinant of performance. For example, DSC ranged from 0.50 to 0.83 with an average of 0.71 ± 0.10 for the esophagus at the best of the three sites in the dataset. At the worst site, DSC ranged from 0.31 to 0.76 with an average of 0.62 ± 0.15. At the third site, DSC ranged from 0.55 to 0.75 with an average of 0.66 ± 0.08. Although the performance of MAC-AS may slightly depend on site-specific scanning procedures, the above suggests that MAS-AS is robust across several different site-specific scanning protocols.
Finally, the way of drawing manual contours might offset the evaluation of segmentation results. The manual contours were drawn in axial 2D slices, which created sharp edges at the top and bottom slices of the esophagus and spinal cord when they were viewed in 3D. However, the contour deformation was processed using a 3D mesh, which necessarily processed the two ends of esophagus or spinal cord to make it smooth. This may have created a discrepancy between auto-segmented and manual contours. Disagreement was found, in most slices, at the two ends. In addition, in head and neck cancer patients, the manual esophagus contours of the inferior stopping slice may not have been consistent in different patients if the patients were positioned with different neck flexion. This potentially reduced the segmentation accuracy when these contours were used for evaluation.
Selecting a subset of optimal atlas candidates using local anatomy similarity improves multi-atlas segmentation. The online atlas selection approach improved the robustness of multi-atlas segmentation of thoracic organs including the esophagus, spinal cord, lung, and heart from CT scans. However, improvement in the robustness of such a method is needed to perform at the level of human experts. For example, exploring alternate image features such as the image texture and shape features that are commonly used for content-based image retrieval [31-34] for atlas ranking and selection is one possibility. Another possible improvement could be to use deep learning to measure the similarity between atlases and test images.
- 1. Chen, A, et al., Evaluation of multiple-atlas-based strategies for segmentation of the thyroid gland in head and neck. Physics in Medicine and Biology, 2012. 57(1): pp. 93-111.
- 2. Klein, S, et al., Automatic segmentation of the prostate in 3D MR images by atlas matching using localized mutual information. Medical Physics, 2008. 35(4): pp. 1407-1417
- 3. Yang, J, et al., Automatic contouring of brachial plexus using a multi-atlas approach for lung cancer radiation therapy. Practical Radiation Oncology, 2013. 3(4): pp. el39-el47.
- 4. Sjoberg, C, et al.. Clinical evaluation of multi-atlas based segmentation of lymph node regions in head and neck and prostate cancer patients. Radiation Oncology, 2013. 8: p. 229.
- 5. Kirisli, HA. et al., Evaluation of a multi-atlas based method for segmentation of cardiac СТА data: a large-scale, multicenter, and multivendor study. Medical Physics, 2010. 37(12): pp. 6279-6291.
- 6. Isgum, I, et al.. Multi-atlas-based segmentation with local decision fusion - application to cardiac and aortic segmentation in CT scans. IEEE Transactions on Medical Imaging, 2009. 28(7): pp. 1000-1010.
- 7. Warfield. SK, KH Zou, and WM Wells, Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Transactions on Medical Imaging, 2004. 230): pp. 903-921.
- 8. Langerak, TR, et al., Label fusion in atlas-based segmentation using a selective and iterative method for performance level estimation (SIMPLE). IEEE Transactions on Medical Imaging, 2010. 29(12): pp. 2000-2008.
- 9. Sabuncu, MR, et al., A generative model for image segmentation based on label fusion. IEEE Transactions on Medical Imaging, 2010. 29(10): pp. 1714-1729.
- 10. Ramus, L and G Malandain, Multi-atlas based segmentation: application to the head and neck region for radiotherapy planning, in Medical Image Analysis for the Clinic: A Grand Challenge, В vanGinneken, et al.. Editors. 2010, CreateSpace Independent Publishing Platform, pp. 281-288. www.amazon.com/ Medical-Image-Analysis-Clinic-Challenge/dp/1453759395
- 11. Aljabar, P, et al. Classifier Selection Strategies for Label Fusion Using Large Atlas Databases. Berlin: Springer Berlin Heidelberg, 2007.
- 12. Iglesias, JE and MR Sabuncu, Multi-atlas segmentation of biomedical images: a survey. Medical Image Analysis, 2015. 24(1): pp. 205-219.
- 13. Yang, J, et al., Atlas ranking and selection for automatic segmentation of the esophagus from CT scans. Physics in Medicine and Biology, 2017. 62(23): pp. 9140-9158.
- 14. Forsythe, GE, MA Malcolm, and CB Moler, Computer Methods for Mathematical Computations. Englewood Cliffs, NJ: Prentice Hall. 1976.
- 15. Zhou, R, et al., Cardiac atlas development and validation for automatic segmentation of cardiac substructures. Radiotherapy and Oncology, 2017. 122(1): pp. 66-71.
- 16. Wang, H, et al., Implementation and validation of a three-dimensional deformable registration algorithm for targeted prostate cancer radiotherapy. International Journal of Radiation Oncology*Biology*Physics, 2005. 61(3): pp. 725-735.
- 17. Yang, J, et al., Automatic segmentation of parotids from CT scans using multiple atlases, in Medical Image Analysis for the Clinic: A Grand Challenge, В van Ginneken, et al.. Editors. 2010, CreateSpace Independent Publishing Platform, pp. 323-330. www.amazon.com/Medical-Image-Analysis-Clinic- Challenge/dp/1453759395
- 18. Lu, WG, et al.. Automatic re-contouring in 4D radiotherapy. Physics in Medicine and Biology, 2006. 51(5): pp. 1077-1099.
- 19. Yang, J, et al., CT images with expert manual contours of thoracic cancer for benchmarking autosegmentation accuracy. Medical Physics, 2020. 47(7): pp. 3250-3255.
- 20. Yang, J, et al., A statistical modeling approach for evaluating auto-segmentation methods for image- guided radiotherapy. Computerized and Medical Imaging Graphics, 2012. 36(6): pp. 492-500.
- 21. Chao, KSC, et al., Reduce in variation and improve efficiency of target volume delineation by a computer-assisted system using a deformable image registration approach. International Journal of Radiation Oncology*Biology*Physics, 2007. 68(5): pp. 1512-1521.
- 22. Reed, VK. et al.. Automatic segmentation of whole breast using atlas approach and deformable image registration. International Journal of Radiation Oncology*Biology*Physics, 2009. 73(5): pp. 1493-1500.
- 23. Fieselmann, A, et al.. Automatic detection of air holes inside the esophagus in CT images, in Bildverarbeitungfur die Medizin 2008. T Tolxdorff, et al.. Editors. 2008. Berlin: Springer, pp. 397-401.
- 24. Yang, J, LH Staib, and JS Duncan, Neighbor-constrained segmentation with level set based 3-D deformable models. IEEE Transactions on Medical Imaging, 2004. 23(8): pp. 940-948.
- 25. Gao, Y, et al., A 3D interactive multi-object segmentation tool using local robust statistics driven active contours. Medical Image Analysis, 2012. 16(6): pp. 1216-1227.
- 26. Yang, J, et al.. Autosegmentation for thoracic radiation treatment planning: a grand challenge at AAPM 2017 Medical Physics, 2018. 45(10): pp. 4568-4581.
- 27. Horsfield, MA, et al., Rapid semi-automatic segmentation of the spinal cord from magnetic resonance images: application in multiple sclerosis. Neuroimage, 2010. 50(2): pp. 446-455.
- 28. Kohlberger, T, et al., Automatic multi-organ segmentation using learning-based segmentation and level set optimization. Medical Image Computing and Computer Assisted Intervention, 2011. 14(Pt 3): pp. 338-345.
- 29. Bai. W, et al., Multi-atlas segmentation with augmented features for cardiac MR images. Medical Image Analysis, 2015. 19(1): pp. 98-109.
- 30. Feulner, J, et al., A probabilistic model for automatic segmentation of the esophagus in 3-D CT scans. IEEE Transactions on Medical Imaging, 2011. 30(6): pp. 1252-1264.
- 31. Newsam, SD and C Kamath, Comparing shape and texture features for pattern recognition in simulation data. Proceedings of the. SPIE 5672, Image Processing: Algorithms and Systems IV, 2005: pp. 106-117.
- 32. Howarth, P and S Riiger, Evaluation of texture features for content-based image retrieval, in Proceedings of the International Conference on Image and Video Retrieval, P Enser, et al., Editors. 2004, Berlin: Springer, pp. 326-334.
- 33. Manjunath, BS and WY Ma, Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1996.18(8): pp. 837-842.
- 34. Yang, J, et al., Diffusion tensor image registration using tensor geometry and orientation features. Medical Image Computing and Computer-Assisted Intervention - MICCAI, Pt II, Proceedings, 2008. 5242(Pt 2): pp. 905-913.