Introduction to Auto-Segmentation in Radiation Oncology

Jinzhong Yang, Gregory C. Sharp, and Mark ). Gooding


In the past two decades, the advancement in radiation therapy has allowed delivery of radiation to the treatment target with optimized spatial dose distribution that minimizes radiation toxicity to the adjacent normal tissues [1,2]. In particular, witli the advent of intensity modulated radiation therapy (IMRT) [1], fast and accurate delineation of targets and concerned organs at risk (OARs) from computed tomography (CT) images is extremely important for treatment planning in order to achieve a favorable dose distribution for treatment. Traditionally, these structures are manually delineated by clinicians. This manual delineation is time-consuming, labor intensive, and often subject to inter- and intra-observer variability [3,4]. With the technology development in medical imaging computing, as well as the availability of more and more curated image and contour data, auto-segmentation has become increasingly available and important in radiation oncology to provide fast and accurate contour delineation in recent years. More and more auto-segmentation tools have been gradually used in routine clinical practice.

Evolution of Auto-Segmentation

The improvement in the performance of auto-segmentation algorithms has evolved alongside the capability of the algorithms to use prior knowledge for new segmentation tasks [5]. In the early stages of development, limited by the computer power and the availability of segmented data, most segmentation techniques used no or little prior knowledge, relying on the developer to encode their belief in what would provide good segmentation. These methods are referred to as low-level segmentation approaches. Methods include intensity thresholding, region growing, and heuristic edge detection algorithms [6-8]. More advanced techniques were developed in an attempt to avoid heuristic approaches leading to the introduction of uncertainty models and optimization methods. Region-based techniques, such as active contours, level-sets, graph cuts, and watershed algorithms, have been used in medical imaging auto-segmentation [9-12]. Probability-based auto-segmentation techniques, such as Gaussian mixture models, clustering, к-nearest neighbor, and Bayesian classifiers, rose in popularity with the turn of the century thanks to the availability of higher computing power [13, 14]. These techniques employ a limited quantity of prior knowledge in the form of statistical information about each organ’s appearance acquired from example images.

In the last two decades, a large amount of exploratory work has been invested in making better use of prior knowledge, such as shape and appearance characteristics of anatomical structures, to compensate for insufficient soft tissue contrast of CT data which prevents accurate boundary definition using low-level segmentation methods. The approaches can be grouped as (multi)atlas-based segmentation, model-based segmentation, and machine learning-based segmentation [15], taking into account prior knowledge using differing techniques and to differing extents.

Single atlas-based segmentation uses one reference image, referred to as an atlas, in which structures of interest are already segmented, as prior knowledge for new segmentation tasks [16]. The segmentation of a new patient image relies on deformable registration, finding the transformation between the atlas and the patient image to map the contours from the atlas to the patient image. Various deformable registration algorithms have been used for this purpose [17-22], with intensity- based algorithms being popular to achieve full automation. The segmentation performance largely depends on the performance of deformable registration, which, in turn, depends on the similarity of the morphology of organs of interest between the atlas and the new image. To achieve good segmentation results, varied atlas selection strategies have been proposed [23-29]. Alternatively, using an atlas that reflects the anatomy of an average patient can potentially improve segmentation performance [30, 31].

Atlas-based segmentation is also impacted by the inter-subject variability since inaccurate contouring in the atlas will be propagated to the patient image. Instead of using a single atlas, multiatlas approaches use a number of atlases (normally around ten) as prior knowledge for segmentation of new images [32-37]. Similar to single atlas-based approaches, deformable registration is used to map atlas contours from each atlas to the patient image. Then an additional step, frequently referred as to label/contour fusion, is performed to combine the individual segmentations from each atlas to produce a final segmentation that is the best estimate of the true segmentation [29, 38-41]. Multiatlas segmentation has been shown to minimize the effects of inter-subject variability and improve segmentation accuracy over single atlas approaches. In the past decade, multi-atlas segmentation has been one of the most effective segmentation approaches in different grand challenges [42-44]. This approach has been validated for clinical radiation oncology applications in contouring normal head and neck tissue [45], cardiac substructures [46], and brachial plexus [34], among others. Commercial implementation of multi-atlas segmentation is also available from multiple vendors [15]. Part I of this book focuses on multi-atlas segmentation, particularly considering atlas selection strategies, deformation registration choice, and the impact of label/contour fusion.

When more contoured images are available, characteristic variations in the shape or appearance of structures of interest can be used for auto-segmentation. Statistical shape models (SSM) or statistical appearance models (SAM) can model the normal range of shape or appearance using a training set of atlases. These approaches have the benefit, compared to atlas-based methods, of restricting the final segmentation results to anatomically plausible shapes described by the models [47]. Consequently, they have shown good performance where registration is poor, for example where there is limited contrast. However, model-based segmentation is less flexible to extreme anatomical variation due to the limitation of specific shapes characterized by the statistical models, particularly where the size and content of the training data are limited. In radiation oncology applications, model-based segmentation is mostly used for the segmentation of structures in the pelvic region [48-50].

To take better advantage of prior knowledge, without using data directly as in atlas-based contouring, general machine learning approaches have been used to offer greater flexibility over model- based methods (although model-based methods are effectively limited-capacity machine learning models). Machine learning approaches can aid in segmentation by learning appropriate priors of organs shapes and image context and appearance for voxel classification [51-53]. Support vector machines and tree ensemble (i.e. random forests) algorithms have shown promising results in thoracic, abdominal, and pelvic tumor and normal tissue segmentation [54-56]. These generally employ human-engineered features, usually derived from the image intensity histograms, and use large databases of patients as inputs to train the segmentation model.

Deep learning is a specific part of the broader field of machine learning where algorithms are able to learn data representations on their own. More specifically, deep learning uses artificial neural networks with multiple (two or more) hidden layers (those between input and output layers) to learn features from a dataset by modeling complex non-linear relationships. The advancement of deep learning is attributed to the availability of more curated data, the advancement in computer power (e.g. Graphics Processing Unit (GPU) applications), and efficient algorithms. Previously, deep architectures were prone to model overfitting; however, algorithmic advances over the past decade have allowed for the use of very deep architectures (100+ layers) to achieve “superhuman” performance in some tasks. Furthermore, the application of GPUs to speed up computations has allowed the field to progress rapidly.

Convolutional neural networks (CNN) are of particular interest in computer vision tasks (i.e. segmentation, detection, classification) as these learn the filters or kernels that were previously engineered for use in traditional approaches [57]. CNNs allow for the classification of each individual pixel in the image; however, this becomes computationally expensive as the same convolutions are computed several times due to the large overlap between input patches from neighboring pixels. Fully convolutional networks (FCNs), introduced by Long et al. [58], overcome the loss of spatial information resulting from the implementation of fully connected layers as final layers of classification CNNs. Most FCNs used for medical image segmentation are based on 2D or 3D variants of successful methods adapted from computer vision. Improvements in 3D convolution computation efficiency and hardware, in particular the fast increase in available GPU memory, have enabled the extension of these methods to 3D imaging. The most popular medical image segmentation FCN architecture is the U-net and its variants [59-61]. In more recent grand challenges, deep learning-based segmentation approaches have been the most powerful and dominant segmentation approaches [44, 62]. Part II of this book focuses on deep learning-based auto-segmentation, considering a range of topics including architecture design and selection, loss function choice, and data augmentation methods.

Evaluation of Auto-Segmentation

Although auto-segmentation algorithms have been available for more than two decades, clinical use of auto-segmentation is limited. This is partly due to the lack of an effective approach for their evaluation, and a perception that auto-segmentation is of lower quality than human segmentation. In recent years, the concept of a “grand challenge” has emerged as an unbiased and effective approach for evaluating different segmentation approaches [42-44]. In a grand challenge, the participants are invited to evaluate their algorithms using a common benchmark dataset, with the algorithm performance being scored by an impartial third party. This framew'ork allows the different segmentation approaches to be evaluated more evenly and reduces the risk of evaluation error due to overfitting to test cases and allows direct comparison of methods. Grand challenges attract some of the best academic and industrial researchers in the field. The competition is friendly and stimulates scientific discussion among participants, potentially leading to new ideas and collaboration.

This book was inspired by the 2017 AAPM Thoracic Auto-segmentation Challenge held as an event of the 2017 Annual Meeting of American Association of Physicists in Medicine (AAPM) [44]. This grand challenge invited participants from around the globe to apply their algorithms to perform auto-segmentation of OARs from real patient CT images collected from a variety of institutions. The organs to be segmented were the esophagus, heart, lung, and spinal cord. The grand challenge consisted of two phases: an offline contest and an online contest. The offline contest was conducted in advance of the AAPM 2017 Annual Meeting. The training data consisted of planning CT scans from 36 different patients with curated contours. These were made available to the participants prior to the offline contest through The Cancer Imaging Archive (TCIA) [63]. The participants were given one month to train or refine their algorithms using the training data. An additional 12 test cases were distributed to the participants, without contours, for the offline contest. Participants were given three weeks to process these test cases with their algorithms and submit the segmentation results to the grand challenge website ( The segmentations were then evaluated by the organizers of the grand challenge. More than 100 participants registered on the challenge website by the time the offline contest concluded, and 11 participants submitted their offline results to the contest. Seven participants from the offline contest participated in the online challenge with three remote and four on-site participants. The online contest was held at the AAPM 2017 Annual Meeting and was followed by a symposium focusing on the challenge. During the online contest, the participants had two hours to process 12 previously unseen test cases. The segmentations were evaluated by the organizers and the challenge results were announced at the symposium the day after the online competition. This grand challenge provided a unique opportunity for participants to compare their automatic segmentation algorithms with those of others from academia, industry, and government in a structured, direct way using the same datasets. All online challenge participants were invited to contribute a chapter to this book (although not all chose to do so) addressing a specific strength of their segmentation algorithms. All chapter authors were encouraged to use the same common benchmark dataset to demonstrate the aspects of the methods in a consistent manner.

< Prev   CONTENTS   Source   Next >