Evaluation Criteria

The Dice similarity coefficient (DSC) was used to evaluate the performance of organ segmentation [59]:

where A is the manually segmented organ (i.e. the ground truth) and В is the automatically segmented organ by the network. The DSC ranges from 0 to 1 with the latter indicating a perfect performance. Chapter 15 gives alternative measures that could be used for this evaluation.


The performance of the network with and without data augmentation in organ segmentation was evaluated in terms of the DSC. The segmentation results of all organs are summarized in Figure 11.7 and Table 11.1. The Dice scores showed significant improvement after data augmentation, especially for small or indistinguishable organs. A visual comparison of manual segmentation and automatic segmentation (with and without data augmentation) for all organs from both LCTSC and PCT is shown in Figure 11.8a and b. The results indicate that data augmentation can boost the segmentation performance for deep neural networks.


In this chapter, a range of data augmentation methods are summarized. It was found that the affine transformations are still the most widely used in practice, because they are easy to implement and operate in real time due to low' time complexity. In addition, an interesting characteristic of these augmentation methods is their ability to be combined. For example, samples taken from GANs can be augmented with geometry transformation such as flip, rotation, and translation to create more samples. Such hybridizing techniques from various data augmentation algorithmic groups have the potential to further boost the performance of large-capacity deep learning model.

Data augmentation can be applied not only to the training stage but also the test stage. Test- stage augmentation is analogous to ensemble learning in the data space. Instead of aggregating the predictions of different learning algorithms, predictions are aggregated across augmented images. For example, all segmentation results can be predicted after the CT image is flipped, rotated, and scaled, then the segmentation results are averaged to form the final segmentation result of the CT

Evaluation of organ segmentation performance in terms of DSC. (a) Data based on 60 patients from the LCTSC database, (b) Data based on 43 patients from the PCT database

FIGURE 11.7 Evaluation of organ segmentation performance in terms of DSC. (a) Data based on 60 patients from the LCTSC database, (b) Data based on 43 patients from the PCT database.

image. Such an approach focuses more on the high segmentation accuracy instead of the prediction speed.

Combined with self-supervised learning, data augmentation can effectively improve the performance of a deep learning model. A common practice for data augmentation is to assign the same label to all augmented samples of the same source. However, if the augmentation results in large distributional discrepancy among them (e.g. rotations), forcing their label invariance may be too difficult to solve and often reduces the performance. To solve this challenge, Lee et al. [60] proposed a simple yet effective idea of learning the joint distribution of the original and self-supervised labels of augmented samples. The joint learning framework is easier to train, and enables an aggregated inference combining the predictions from different augmented samples for improving the performance.

In the two application examples, the data augmentations such as flipping, rotation, and random cropping were all used to train a 3D CNN model to automatically segment multiple organs in

TABLE 11.1

Comparison of the Segmentation Results (Mean ± Standard Deviation) between Models with and without Data Augmentation




(with Augmentation)


(without Augmentation)


Spinal cord

0.853 ± 0.043

0.842 ±0.041

Right lung

0.965 ±0.017

0.963 ±0.015

Left lung

0.961 ±0.014

0.957 ± 0.020


0.916 ±0.030

0.895 ± 0.035


0.723 ±0.118

0.657 ±0.139



0.951 ±0.043

0.950 ± 0.032


0.764 ±0.105

0.736 ±0.091

Left kidney

0.941 ±0.038

0.932 ± 0.047

Gall bladder

0.812 ±0.161

0.784 ±0.217


0.713 ±0.121

0.676 ±0.133


0.954 ±0.016

0.946 ± 0.033


0.876 ± 0.074

0.857 ± 0.085


0.586 ±0.175

0.565 ±0.143

patient-specific CT images using two datasets. The segmentation performance of most of organs was acceptable - the Dice was over 0.8 - but for some organs such as the duodenum, esophagus, pituitary, the segmentation performance of the network was found to be relatively poor, despite augmentation, because the organ and its surrounding tissues have similar pixel values in CT image, making the boundary difficult to detect by the CNN model. The results indicated that the data augmentation may not be sufficient to solve this problem. To improve the segmentation performance of these organs, more high-quality data or multi-modality data may be needed.

Although data augmentation has become a key part in training deep neural networks for autosegmentation, there are still promising and unexplored research pathways in the literature. For example, there is no consensus about the best strategy for combining various data augmentation methods. One important consideration is the intrinsic bias in the initial, limited dataset. There are no existing augmentation techniques that can correct a dataset that has very poor diversity with respect to the testing data. All these data augmentation algorithms perform best under the assumption that the training data and testing data are both drawn from the same data distribution. If this is not true, it is very unlikely that these methods will be useful. Additionally, an interesting question for practical data augmentation is how to determine augmented dataset size. There is no consensus as to which ratio of original-to-final dataset size will result in the best performing model.


Deep learning models rely on big datasets to avoid overfitting. Data augmentation is a very useful technique for constructing bigger datasets. In this chapter, the state-of-the-art data augmentation methods applied to automatic segmentation for radiation oncology were reviewed, including geometric transformation, intensity transformation, and artificial data generation. In addition, an example of data augmentation in segmentation was demonstrated on two datasets. The results indicated that data augmentation can be very important for training the organ segmentation model. Data augmentation has become a critical part of the deep learning-powered methods.

Examples for visual comparison of organ segmentation between manual methods from LCTSC or PCT database and the automatic methods, in terms of axial, sagittal, coronal, and 3D views

FIGURE 11.8 Examples for visual comparison of organ segmentation between manual methods from LCTSC or PCT database and the automatic methods, in terms of axial, sagittal, coronal, and 3D views (from left to right), (a) LCTSC database showing left lung (yellow), right lung (cyan), heart (blue), spinal cord (green), and esophagus (red), (b) PCT database showing spleen (green), pancreas (white), left kidney (yellow), gallbladder (blue), esophagus (red), liver (bisque), stomach (magenta), and duodenum (purple).


This work was supported in part by Wisdom Tech, in part by an International Training Grant from the American Association of Physicists in Medicine (AAPM), and in part by various grants: NIH/NIBIB (R42EB019265-01A1, U01EB017140, R01EB026646), NIH/NCI (R01CA233888 and R01CA237267), and National Natural Science Foundation of China (11575180).


1. H Shan, X Jia, P Yan, Y Li, H Paganetti, and G Wang, “Synergizing medical imaging and radiotherapy with deep learning,” Machine Learning: Science and Technology, vol. 1, no. 021001, 2020. https://io pscience.iop.org/article/10.1088/2632-2153/ab869f/meta

  • 2. Y LeCun, L Bottou, Y Bengio, and P Haffner, “Gradient-based learning applied to document recognition.” Proceedings of the IEEE, vol. 86. no. 11. pp. 2278-2324. 1998.
  • 3. A Krizhevsky, I Sutskever, and GE Hinton, “ImageNet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017.
  • 4. J Redmon, S Divvala, R Girshick, and A Farhadi, “You only look once: unified, real-time object detection,” presented at the Computer Vision and Pattern Recognition, 6/27/2016.
  • 5. К He, G Gkioxari, P Dollar, and R Girshick, “Mask R-CNN,” IEEE Transactions on Pattern Analysis, vol. 42. no. 2. pp. 386-397, 2/1/2020.
  • 6. J Donahue et al., “Long-term recurrent convolutional networks for visual recognition and description,” IEEE Transactions on Pattern Analysis, vol. 39. no. 4, pp. 677-691,4/1/2017.
  • 7. L-C Chen, G Papandreou, I Kokkinos, К Murphy, and AL Yuille, “DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Transactions on Pattern Analysis, vol. 40. no. 4. pp. 834-848, 4/1/2018.
  • 8. X Fang, В Du, S Xu, BJ Wood, and P Yan, “Unified multi-scale feature abstraction for medical image segmentation,” in Medical Imaging 2020: Image Processing, 2020, vol. 11313: International Society for Optics and Photonics, p. 1131319.
  • 9. Y LeCun. Y Bengio. and G Hinton, “Deep learning,” Nature, vol. 521. no. 7553. pp. 436-444,5/28/2015.
  • 10. A Halevy, P Norvig, and F Pereira, “The unreasonable effectiveness of data,” IEEE Intelligent Systems, vol. 24, no. 2, pp. 8-12, 2009.
  • 11. C Sun, A Shrivastava, S Singh, and A Gupta, “Revisiting unreasonable effectiveness of data in deep learning era,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 843-852.
  • 12. G Wang, “A perspective on deep imaging,” IEEE Access, vol. 4, pp. 8914-8924, 2016.
  • 13. Z Hussain, F Gimenez, D Yi, and DL Rubin, “Differential data augmentation techniques for medical imaging classification tasks,” presented at the American Medical Informatics Association Annual Symposium, 1/1/2017.
  • 14. S-C Park, JH Cha, S Lee, W Jang, CS Lee, and JK Lee, “Deep learning-based deep brain stimulation targeting and clinical applications,” Frontiers in Neuroscience-Switz, vol. 13, 10/24/2019.
  • 15. E Gibson et al., “NiftyNet: a deep-learning platform for medical imaging,” Computer Methods and Programs in Biomedicine, vol. 158, pp. 113-122, 5/1/2018.
  • 16. H Shan et al., “3-D convolutional encoder-decoder network for low-dose CT via transfer learning from a 2-D trained network,” IEEE Transactions on Medical Imaging, vol. 37, no. 6, pp. 1522-1534, 2018.
  • 17. C Shorten and TM Khoshgoftaar, “A survey on image data augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, p. 60, 2019. https://link.springer.coin/article/10.1186/s40537-019-0197-0
  • 18. R Takahashi, T Matsubara, and К Uehara, “Data augmentation using random image cropping and patching for deep CNNs,” IEEE Transactions on Circuits and Systems for Video, pp. 1-1, 1/1/2020.
  • 19. X Dong et al., “Automatic multiorgan segmentation in thorax CT images using U-net-GAN,” (in English). Medical Physics, vol. 46, no. 5, pp. 2157-2168, May 2019.
  • 20. К Men, JR Dai, and YX Li, “Automatic segmentation of the clinical target volume and organs at risk in the planning CT for rectal cancer using deep dilated convolutional neural networks,” (in English), Medical Physics, vol. 44, no. 12. pp. 6377-6389, Dec 2017.
  • 21. CC Stearns, and К Kannappan, “Method for 2-D affine transformation of images,” Ed: Google Patents, 1995.
  • 22. E Gibson et al., “Automatic multi-organ segmentation on abdominal CT with dense v-networks,” IEEE Transactions on Medical Imaging, vol. 37, no. 8. pp. 1822-1834, 2018.
  • 23. Q Dou et al., “3D deeply supervised network for automated segmentation of volumetric medical images,” Medical Image Analysis, vol. 41, pp. 40-54, 2017.
  • 24. XM Li. H Chen, XJ Qi, Q Dou. CW Fu, and PA Heng, “H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes,” IEEE Transactions on Medical Imaging, vol. 37, no. 12, pp. 2663-2674, Dec 2018.
  • 25. X Feng, К Qing, NJ Tustison, CH Meyer, and Q Chen, “Deep convolutional neural network for segmentation of thoracic organs-at-risk using cropped 3D images,” Medical Physics, vol. 46, no. 5, pp. 2169-2180, May 2019.
  • 26. В Ibragimov and L Xing, “Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks,” Medical Physics, vol. 44, no. 2, pp. 547-557,2017.
  • 27. X Fang and P Yan, “Multi-organ segmentation over partially labeled datasets with multi-scale feature abstraction,” IEEE Transactions on Medical Imaging, vol. 39. no. 11. pp. 3619-3629, 2020. doi: 10.1109/ TMI.2020.3001036
  • 28. S Wong, A Gatt, V Stamatescu, and MD Mcdonnell, “Understanding data augmentation for classification: when to warp?,” in Digital Image Computing Techniques and Applications, 2016, pp. 1-6.
  • 29. О Ronneberger, P Fischer, and T Brox, “U-net: convolutional netw'orks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015: Springer, pp. 234-241.
  • 30. HR Roth et al., “DeepOrgan: multi-level deep convolutional networks for automated pancreas segmentation,” in Medical Image Computing and Computer Assisted Intervention, 2015, pp. 556-564.
  • 31. N Nguyen, and S Lee, “Robust boundary segmentation in medical images using a consecutive deep encoder-decoder network,” IEEE Access, vol. 7, pp. 33795-33808, 2019.
  • 32. WT Zhu et al., “AnatomyNet: deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy,” (in English), Medical Physics, vol. 46, no. 2, pp. 576-589, Feb 2019.
  • 33. D Lachinov, E Vasiliev, and V Turlapov, “Glioma segmentation with cascaded UNet,” in International MICCAI Brainlesion Workshop, 2018: Springer, pp. 189-198.
  • 34. J Nalepa, M Marcinkiewicz, and M Kawulok, “Data augmentation for brain-tumor segmentation: a review,” Frontiers in Computer Neuroscience, vol. 13, p. 83, 2019.
  • 35. A Galdran et al., “Data-driven color augmentation techniques for deep skin image analysis,” arXiv: Computer Vision and Pattern Recognition, 2017.
  • 36. PJ Hu, F Wu, JL Peng, YY Bao, F Chen, and DX Kong, “Automatic abdominal multi-organ segmentation using deep convolutional neural network and time-implicit level sets,” (in English), International Journal of Computer Assisted Radiology, vol. 12, no. 3, pp. 399-411, Mar 2017.
  • 37. NV Chawla, KW Bowyer, LO Hall, and WP Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
  • 38. H Inoue, “Data augmentation by pairing samples for images classification,” arXiv preprint arXiv:1801.02929, 2018.
  • 39. H Zhang, M Cisse, YN Dauphin, and D Lopez-Paz, “mixup: beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.
  • 40. I Goodfellow et al., “Generative adversarial nets,” in Advances in Neural Information Processing Systems, 2014, pp. 2672-2680.
  • 41. A Radford, L Metz, and S Chintala, “Unsupervised representation learning with deep convolutional generative adversarial netw'orks,” arXiv preprint arXiv:I5II.06434, 2015.
  • 42. M Arjovsky, S Chintala, and L Bottou, “Wasserstein gan,” arXiv preprint arXiv:1701.07875, 2017.
  • 43. J-Y Zhu, T Park, P Isola, and AA Efros, “Unpaired image-to-image translation using cycle-consistent adversarial netw'orks,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223-2232.
  • 44. V Sandfort, К Yan, PJ Pickhardt, and RM Summers, “Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks,” Scientific Reports -Uk, vol. 9, no. 1. p. 16884, 2019.
  • 45. M Frid-Adar, E Klang, M Amitai, J Goldberger, and H Greenspan, “Synthetic data augmentation using GAN for improved liver lesion classification,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (1SBI20I8), pp. 289-293. April 4, 2018.
  • 46. Y-B Tang, S Oh, Y-X Tang, J Xiao, and RM Summers, “CT-realistic data augmentation using generative adversarial network for robust lymph node segmentation,” in Journal of Medical Imaging 2019: Computer-Aided Diagnosis, 2019, vol. 10950: International Society for Optics and Photonics, p. 109503V.
  • 47. D Zou, Q Zhu, and P Yan, “Unsupervised domain adaptation w'ith dual-scheme fusion network for medical image segmentation,” presented at the International Joint Conference on Artificial Intelligence (IJCAI), Yokohama. Japan, July 11-17. 2020.
  • 48. J Yang et al., “Data from lung CT segmentation challenge,” The Cancer Imaging Archive, 2017. http:// doi.org/10.7937/K9/TCI A.2017.3r3fvz08
  • 49. К Clark et al., “The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository,” Journal of Digital Imaging, vol. 26, no. 6, pp. 1045-1057, 2013.
  • 50. J Yang et al., “Autosegmentation for thoracic radiation treatment planning: a grand challenge at AAPM 2017,” Medical Physics, vol. 45, no. 10, pp. 4568-4581, 2018.
  • 51. HR Roth, A Farag. E Turkbey, L Lu. J Liu, and RM Summers, “Data from Pancreas-CT," The Cancer Imaging Archive, 2016. https://doi.org/10.7937/K9/TCIA.2016.tNBlkqBU
  • 52. 6 £icek. A Abdulkadir, SS Lienkamp, T Brox, and О Ronneberger, “3D U-Net: learning dense volumetric segmentation from sparse annotation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2016.
  • 53. Z Peng et al., “A method of rapid quantification of patient-specific organ doses for CT using deep- learning-based multi-organ segmentation and GPU-accelerated Monte Carlo dose computing,” Medical Physics, vol. 47. no. 6. pp. 2526-2536, Mar 10 2020.
  • 54. V Badrinarayanan, A Kendall, and R Cipolla, “SegNet: a deep convolutional encoder-decoder architecture for image segmentation,” (in English), IEEE Transactions on Pattern Analysis, vol. 39, no. 12, pp. 2481-2495. Dec 2017.
  • 55. CM Bishop, Pattern Recognition and Machine Learning. Springer, 2006.
  • 56. S Arlot, and A Celisse, “A survey of cross validation procedures for model selection,” Statistics Surveys, vol. 4, pp. 40-79, 2010.
  • 57. DP Kingma, and J Ba, “Adam: a method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • 58. M Abadi et al., “Tensorflow: a system for large-scale machine learning,” in 12th fUSENIXj Symposium on Operating Systems Design and Implementation ((OSDIj 16), 2016, pp. 265-283.
  • 59. LR Dice, “Measures of the amount of ecologic association between species,” Ecology, vol. 26, no. 3, pp. 297-302, 1945.
  • 60. H Lee, SJ Hwang, and J Shin, “Rethinking data augmentation: self-supervision and self-distillation,” arXiv preprint arXiv:1910.05872, 2019.

12 Identifying Possible

< Prev   CONTENTS   Source   Next >