What Makes Deep Learning-Based Contouring So Different to Atlas-Based or Model-Based Approaches


Deep learning-based, model-based, and atlas-based segmentation are very different techniques; therefore, fully understanding the differences between them that lead to difference in performance would require a substantial degree of theoretical analysis and experimental research. However, at a high level it is possible to assess their underlying assumptions, use of data, and the order of magnitude of the number of degrees of freedom to get an appreciation of what makes deep learning-based contouring so powerful. These attributes are summarized in Table 6.1.

Underlying Assumptions

The underlying assumptions of each segmentation approach are fundamental to what makes each approach so distinct. While one atlas-based segmentation method may differ from another or one deep-learning architecture from another, the class of segmentation approach remains the same, as do the main assumptions in which they are grounded. Model-based segmentation typically makes the assumption that while there is variation between patients in anatomy, these differences can be modeled by a limited number of modes of variation. Furthermore, the assumption is made that the segmentation at a specific anatomical location can be made locally. Together these assumptions constrain the segmentation to fit the model so as to best fit the local appearance similarity while overall constraining the global shape to be a plausible variation. That the training data fully represents the likely variations in anatomy is implicitly assumed. For atlas-based segmentation the assumption is that the atlas is anatomically similar to the patient, and that any differences in anatomy can be overcome by deformable image registration. Where multi-atlas fusion is used, the assumption is further made that the differences between contours represent random errors resulting either from inaccurate contouring of the atlas or the inability of the registration to align the atlas to the patient. Like model-based segmentation, deep learning contouring also assumes that the training data adequately reflects the variation of anatomy and image appearance observed in the patient population. Less assumption is made as to the nature of the data, with the network free to learn what is important. Flowever, deep learning is inherently statistical; variation used infrequently in training will not strongly influence the weights learned in the model. The representative nature of the data used to train, or as input, to the segmentation method is common to all approaches. However, the methods differ in how strongly this input is expected to closely represent a new patient case.

Use of Data

As noted in the previous section, all the categories of methods considered here use previous cases to inform the segmentation of new ones. However, the methods differ in how they use this data. Both model-based and deep learning-based approaches use the data to train parameters of a segmentation system. While the deep learning-based contouring may have many more parameters to train and make weaker assumptions on how the image and contour relate, both methods encode knowledge as a segmentation model. While the previous data is used for training, only the model is required at run-time to segment a new patient case. In contrast, atlas-based segmentation uses the input cases directly, as previously, on the segmentation and no attempt is made to encode important features of the input data. This means that negligible effort is required to prepare a segmentation system before processing the patient case. However, the full data for all the atlases are required at run-time and must be available wherever the system is deployed. In one respect this could make atlas-based methods potentially more powerful; no information is lost in creating a model. Unfortunately, the limitation that the atlas data must be available restricts the use of “big data” in commercial systems. Furthermore, the strong assumption that deformable image registration is sufficient to overcome difference in appearance between the atlas and the patient is a limitation which affects how well this “full information” is used.

Degrees of Freedom

For any of the methods considered, their complexity makes it nearly impossible to calculate the true degrees of freedom provided. While each method may have a large number of free parameters, there are also correlations in behavior between local outputs as a result of regularizers that constrain the freedom. For example, the mesh nodes locations in an active shape model are constrained and correlated by restricting the number of eigenmodes to form more likely shapes. Therefore, each node cannot be said to have complete freedom with respect to another. Despite this, examination of the parameters, such as the number of mesh nodes, can give an indication of the number of degrees of freedom available.

Taking heart segmentation as an example, the differences in degrees of freedom between these three classes can be gauged. Zhao et al. used a modification of the active shape model for segmentation of the heart to aid radiotherapy planning [31]. While the number of eigenmodes used is not reported, it can be seen that around 900 surface nodes were used. Each node has three degrees of freedom; therefore, it could be assumed that there are approximately three thousand degrees of freedom to the model. In contrast, Rikxoort et al. used an atlas-based method for cardiac segmentation on CT. Since registration provides the ability to adapt to the patient, this can be examined to understand the method’s degrees of freedom. A b-spline registration is used in the Rikxoort method. At its finest resolution, a spline control point is placed every four pixels, following subsampling of the image by a factor of two. This would result in approximately 80,000 degrees of freedom, assuming a typical image size of 150 slices of 512 x 512 pixels. If each atlas can be considered to provide independent information, the use of 10 atlases would raise this to 800,000 degrees of freedom. Finally, considering deep learning-based segmentation, for the methods used in the 2017 AAPM Thoracic Autosegmentation Challenge, albeit for segmentation of five organs and not just the heart, the number of free parameters to be tuned were reported to be between 14 million and 66 million [25]. Therefore, it can be understood that the power of deep learning methods to outperform the other approaches is likely to come from its additional, by an order of magnitude, flexibility to encode knowledge.

Summary of This Part of the Book

This part of the book explores the various choices to be made when designing a deep learning network for organ-at-risk segmentation. Figure 6.6 illustrates the process for the training and use of a deep learning-based segmentation system. Data acquired for the purpose of training a deep learning system is usually separated into three categories: training, validation, and test. Where insufficient data is available, data augmentation techniques are commonly used to ensure that the network is robust to variations in the data and to prevent overfitting. Augmentation can be performed both during training, but also during testing. The training data are passed into the deep learning network resulting in a segmentation. This output is compared to an expected output during training using a loss function, and the loss function is used to determine the gradients to update the network parameters. The same loss function can be used with the validation data to ensure that the network is not overfitting to the training set. Alternatively, quantitative validation approaches can be used to give an indication of potential clinical performance. The independent test set can then be evaluated with the same quantitative measures to confirm the performance observed during training.

The training and use of a deep learning segmentation system

FIGURE 6.6 The training and use of a deep learning segmentation system. Many aspects of training and use are covered within this part of the book. Evaluation and curation of data will be considered in the third part.

When used in the clinic, a known segmentation is not available for evaluation of performance, rather, the contours are inspected and edited for clinical use. This subjective clinical evaluation can also provide insight into how well the network is performing.

Data curation for training, validation and testing, quantitative evaluation measures, and appropriate clinical use of deep-learning segmentation are considered within the third part of this book. In this part of the book, the more technical aspects are reviewed. First, the network architecture is considered. Chapter 7 gives a detailed overview of the various types of network designed to date. No attempt is made at experimental comparison on account of the difficulty in performing such a comparison in a meaningful way. However, the strengths and weaknesses of each approach are considered, and the results of benchmarking reported previously are considered. Recognizing that the U-net [32] is a popular choice for image segmentation; Chapter 8 evaluates the difference in performance between 2D and 3D implementations of this particular architecture. Various constraints could be considered as to what is a “fair” evaluation. In this chapter, what can be achieved for a given graphics card memory is the primary consideration. Chapter 9 considers another design aspect, whether it is better to perform segmentation a single organ at a time or using a multi-label approach. Again, the U-net architecture is evaluated, this time using images resampled to the same input resolutions. Whether multi-class or multi-label segmentation is performed has an impact on the choice of loss function used in network optimization.

Going beyond the network design, Chapter 10 reviews the choices available, considering the strengths and weaknesses with respect to the task of image segmentation for a range of loss functions. Chapter 11 then considers how data augmentation can be used to improve training, particularly where small quantities of data are available. A wider range of data augmentation strategies are discussed, and an evaluation of the impact of a simple data augmentation approach is presented. Finally, Chapter 12 touches on clinical use, considering what might cause organ segmentation to fail. An underlying assumption of deep learning segmentation is that the training data is representative of the use-case scenario. This chapter examines how far a network can be stretched beyond those boundaries.


  • 1. В Ibragimov, L Xing, Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks. Med. Phys. 44 (2017):547—557.
  • 2. R Trullo, C Petitjean, D Nie, D Shen, S Ruan. Joint segmentation of multiple thoracic organs in CT images with two collaborative deep architectures. In: Deep Learn. Med. Image Anal. Multimodal Learn. Clin. Decis. Support, 2017:21-29. https://doi.org/10.1007/978-3-319-67558-9_3.
  • 3. К Men, J Dai, Y Li, Automatic segmentation of the clinical target volume and organs at risk in the planning CT for rectal cancer using deep dilated convolutional neural networks. Med. Phys. 44 (2017):6377- 6389. https://doi.org/10.1002/mp. 12602.
  • 4. К Men, X Chen, Y Zhang, T Zhang, J Dai, J Yi, Y Li, Deep deconvolutional neural network for target segmentation of nasopharyngeal cancer in planning computed tomography images. Front. Oncol. 1 (2017): 1-9. https://doi.org/10.3389/fonc.2017.00315.
  • 5. В Ibragimov, F Pernus, P Strojan, L Xing, Development of a novel deep learning algorithm for autosegmentation of clinical tumor volume and organs at risk in head and neck radiation therapy planning. Int.

J. Radiat. Oncol. 96 (2016):S226. https://doi.Org/10.1016/j.ijrobp.2016.06.561.

  • 6. R Trullo, C Petitjean, S Ruan, В Dubray, D Nie, D Shen, Segmentation of organs at risk in thoracic CT images using a SharpMask architecture and conditional random fields. Proc. - Int. Symp. Biomed. Imaging. (2017): 1003—1006. https://doi.org/10.1109/ISBI.2017.7950685.
  • 7. T Lustberg, J van Soest, M Gooding, D Peressutti, P Aljabar, J van der Stoep, W van Elmpt, A Dekker, Clinical evaluation of atlas and deep learning based automatic contouring for lung cancer. Radiother. Oncol. 126 (2018):312—317. https://doi.Org/10.1016/j.radonc.2017.ll.012.
  • 8. T Lustberg, J Van der Stoep, D Peressutti, P Aljabar, W Van Elmpt, J Van Soest, M Gooding, A Dekker, EP-2124: time-saving evaluation of deep learning contouring of thoracic organs at risk. Radiother. Oncol. 127 (2018):S1169. https://doi.org/10.1016/s0167-8140(18)32433-2.
  • 9. D Peressutti, P Aljabar, J van Soest, T Lustberg, J van der Stoep, A Dekker, W van Elrnpt, MJ Gooding, TU-FG-605-7 deep learning contouring of thoracic organs at risk. Med. Phys. 44(6) (2017):3159.
  • 10. H Wang. В Raj, On the origin of deep learning. (2017): 1-72. http://arxiv.org/abs/1702.07800.
  • 11. J Schrnidhuber, Deep learning in neural networks: an overview. Neural Networks 61 (2015):85—117. https://doi.Org/10.1016/j.neunet.2014.09.003.
  • 12. A Maier, C Syben, T Lasser, C Riess, A gentle introduction to deep learning in medical image processing. Z. Med. Phys. 29 (2019):86-101. https://doi.Org/10.1016/j.zemedi.2018.12.003.
  • 13. H Wang, В Raj, A survey: time travel in deep learning space: an introduction to deep learning models and how deep learning models evolved from the initial ideas. (2015): 1—43. http://arxiv.org/abs/1510. 04781.
  • 14. SK Zhou, H Greenspan, D Shen, eds., Deep Learning for Medical Image Analysis, Academic Press, 2017.
  • 15. AC Muller, S Guido, Introduction to Machine Learning with Python: A Guide for Data Scientists, O'Reilly Media, 2016.
  • 16. A Ng, К Katanforoosh, YB Mourri, Deep learning specialization, (n.d.). www.coursera.org/speciali zations/deep-learning (accessed September 4, 2020).
  • 17. A Amini, A Soleimany, Introduction to deep learning, MIT. (2020). http://introtodeeplearning.com/ (accessed September 4, 2020).
  • 18. A Amini, Introduction to deep learning, YouTube. (2020). www.youtube.com/watch?v=njKP3FqW3Sk (accessed September 4, 2020).
  • 19. J Johnson, Detection and segmentation, YouTube. (2017). www.youtube.com/watch?v=nDPWyw- WRIRo (accessed September 8, 2020).
  • 20. WS McCulloch, W Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5 (1943): 115-133.
  • 21. F Rosenblatt, Perceptron simulation experiments. Proc. IRE. 48 (1960):301—309.
  • 22. TR Willoughby, G Starkschall, NA Janjan, II Rosen, Evaluation and scoring of radiotherapy treatment plans using an artificial neural network. Int. J. Radiat. Oncol. Biol. Phys. 34 (1996):923—930. https://do i.org/10.1016/0360-3016(95 )02120-5.
  • 23. Y LeCun, В Boser, JS Denker, D Henderson, RE Howard, W Hubbard, LD Jackel, Backpropagation applied to digit recognition. Neural Comput. 1 (1989):541—551. www.ics.uci.edu/~welling/teaching/2 73ASpring09/lecun-89e.pdf.
  • 24. Y LeCun, L Bottou, Y Bengio, P Haffner, Gradient-based learning applied to document recognition, Proc. IEEE. 86 (1998):2278—2324. https://doi.Org/10.1109/5.726791.
  • 25. J Yang, H Veeraraghavan, SG Armato, К Farahani, JS Kirby, J Kalpathy-Kramer, W van Elrnpt, A Dekker, X Han, X Feng, P Aljabar, В Oliveira, В van der Heyden, L Zamdborg, D Lam, M Gooding, GC Sharp, Autosegmentation for thoracic radiation treatment planning: a grand challenge at AAPM 2017. Med. Phys. 45 (2018):4568—4581. https://doi.org/10.1002/mp.1314L
  • 26. IBM, 704 Data Processing System, IBM Arch. (n.d.). www.ibm.com/ibm/history/exhibits/rnainframe/ mainframe_PP704.html (accessed October 3, 2020).
  • 27. Computer History Museum, IBM 704 electronic data processing system. Comput. Hist. Museum, (n.d.). www.computerhistory.org/revolution/early-computer-companies/5/113/489 (accessed October 3,2020).
  • 28. NVIDIA. The Ultimate PC GPU NVIDIA Titan RTX. (2019). www.nvidia.com/content/dam/en-zz/So lutions/titan/documents/titan-rtx-for-creators-us-nvidia-1011126-r6-web.pdf.
  • 29. NVIDIA, NVIDIA announces financial results for fourth quarter and fiscal 2019. (2019). https://nvidian ews.nvidia.com/news/nvidia-announces-financial-results-for-fourth-quarter-and-fiscal-2019.
  • 30. NVIDIA. Cuda C programming guide, 2015.
  • 31. X Zhao, Y Wang, G Jozsef, Robust shape-constrained active contour for whole heart segmentation in 3-D CT images for radiotherapy planning. In: 2014 IEEE Int. Conf. Image Process.. IEEE. 2014: 1-5. https://doi.org/10.1109/ICIP.2014.7024999.
  • 32. О Ronneberger, P Fischer, T Brox, U-net: convolutional networks for biomedical image segmentation. Led. Notes Comput. Sci. (Including Suhser. Led. Notes Art if. Intell. Led. Notes Bioinformatics) 9351 (2015):234—241. https://doi.org/10.1007/978-3-319-24574-4_28.
< Prev   CONTENTS   Source   Next >