Evaluation Metrics

The Dice similarity coefficient and 95% Hausdorff distance were used to quantify the accuracy of the segmented structures. These metrics are described in Chapter 15. In contrast to the implementation described in Chapter 15, evaluation was performed on voxel ized segmentations using Plastimatch. To overcome inconsistencies in the superior and inferior extents of manual labeled tubular structures (spinal cord and esophagus), both these segmented structures and the ground truth structures were cropped at the superior and inferior borders by 10 mm with respect to the ground truth structures.

The directed percent Hausdorff measure, for a percentile r, is the rth percentile distance over all distances from points in X to their closest point in Y.

Experimental steps

The LCTSC data [18], described in Chapter 1, was used for the experiments. The 36 training cases were used for optimizing the registration and atlas fusion parameters. The remaining 24 offline and online test cases were used to evaluate accuracy. Four different deformable image registration strategies were carried out. The first three strategies used the B-spline algorithm, and the last strategy used the demons algorithm. All strategies used multiple registration stages as shown in Table 4.3. The B-spline methods used five stages and the demons algorithm used four stages. The first two stages for all the strategies were rigid, followed by the deformable stages. Three different parameters were varied for the B-spline based model: (1) image subsampling rate (Res) was varied within a range from 2 mm to 6 mm, (2) regularization weight and type (Reg) was varied between the curvature regularizer with weights from 0-100, and the third order regularizer with weights from 1-10, and (3) В-Spline grid spacing (GS) was varied within a range from 10 mm to 100 mm. For the last strategy (demons), the width of the Gaussian kernel used to smoothen the displacement field was varied from 1 mm to 4 mm. The value ranges for each of these parameters were selected based on the authors’ prior experience with image registration. In order to narrow down the range of the parameters, a few experiments were run on a subset of atlases. Table 4.1 describes the acronyms used in the chapter. The fixed and constant parameters for all registration strategies are described in Tables 4.2 and 4.3. Only one parameter is varied for a given strategy, keeping all other parameters fixed. The fixed parameter

TABLE 4.1

Acronyms Used for Each Parameter and Their Explanations

Acronym Used in Text

Explanation of Parameter

Res

Image sub-sampling rate

Reg

Regularization weight and type

GS

Grid spacing

Demons

Width of Gaussian kernel

TABLE 4.2

Parameters Varied for Each Registration Strategy

Registration strategy

Registration Stages

Rigid-1

Rigid-2

Deformable-1

Deformable-2

Deformable-3

Res 1

4x4x3

4x4x3

4x4x3

2x2x3

2x2x3

Res 2

4x4x3

4x4x3

4x4x3

3x3x3

3x3x3

Res 3

4x4x3

4x4x3

4x4x3

4x4x3

4x4x3

Res 4

4x4x 12

4x4x 12

4x4x 12

2x2x6

2x2x6

Res 5

6x6x6

6x6x6

6x6x6

3x3x3

2x2x3

Res 6

6x6x6

6x6x6

6x6x6

3x3x3

3x3x3

Res 7

6x6x6

6x6x6

6x6x6

6x6x6

6x6x6

Reg 0

NA

NA

Curvature 0

Curvature 0

Curvature 0

Reg 1

NA

NA

Curvature 100

Curvature 1

Curvature 0.1

Reg 2

NA

NA

Curvature 100

Curvature 10

Curvature 0.1

Reg 3

NA

NA

Curvature 100

Curvature 10

Curvature 1

Reg 4

NA

NA

Curvature 100

Curvature 100

Curvature 100

Reg 5

NA

NA

Third order 100

Third order 10

Third order 1

Reg 6

NA

NA

Third order 1000

Third order 10

Third order 1

Reg 7

NA

NA

Third order 1000

Third order 100

Third order 1

Reg 8

NA

NA

Third order 1000

Third order 100

Third order 10

GS 1

NA

NA

100 x 100 x 100

30 x 30 x 30

10 x 10 x 10

GS 2

NA

NA

100 x 100 x 100

30 x 30 x 30

20 x 20 x 20

GS 3

NA

NA

100 x 100 x 100

50 x 50 x 50

10 x 10 x 10

GS 4

NA

NA

100 x 100 x 100

50 x 50 x 50

20 x 20 x 20

GS 5

NA

NA

100 x 100 x 100

50 x 50 x 50

30 x 30 x 30

GS 6

NA

NA

100 x 100 x 100

50 x 50 x 50

50 x 50 x 50

GS 7

NA

NA

100 x 100 x 100

100 x 100 x 100

100 x 100 x 100

Demons 1

NA

NA

2

1

NA

Demons 2

NA

NA

3

2

NA

Demons 3

NA

NA

4

3

NA

Demons 4

NA

NA

5

4

NA

settings are marked in bold in Table 4.2. For example, when the image subsampling rate is varied, the grid spacing, regularization type, and weight are held constant as seen in Table 4.3. Once the optimal parameters were determined, each query image was segmented based on the optimal parameters and the segmentations were evaluated for each anatomical structure against the ground truth.

TABLE 4.3

Constant Parameters for the Registration Strategies

Parameter varied

Constant parameters

Registration Stages

Rigid-1

Rigid-2

Deformable-1

Deformable-2

Deformable-3

Image subsampling (mm)

Regularizer type and weight

NA

NA

Curvature 100

Curvature 1

Curvature 0.1

Grid spacing (mm)

NA

NA

100 x 100 x I00

50 x 50 x 50

30 x 30 x 30

Regularizer type and weight

Image subsampling (mm)

6x6x6

6x6x6

6x6x6

3x3x3

3x3x3

Grid spacing (mm)

NA

NA

100 x 100 x 100

50 x 50 x 50

30 x 30 x 30

Grid spacing (mm)

Image subsampling (mm)

6x6x6

6x6x6

6x6x6

3 x 3 x 3

3 x 3 x 3

Regularizer type and weight

NA

NA

Curvature 100

Curvature 1

Curvature 0.1

Width of Gaussian kernel

Image subsampling (mm)

6x6x6

6x6x6

6x6x6

3x3x3

NA

Results

The voxel sampling rate was found not to affect the performance of either of the registration algorithms substantially. The B-spline method was also found to be relatively robust to variations in the control-point spacing. However, both methods were found to be most sensitive to the tuning of the regularizer. Figure 4.1 compares the average Dice Similarity Coefficient (DSC) over the five Organs-at-risk (OARs) by varying the (a) width of the Gaussian kernel and (b) curvature regularizer weight. It is observed that increasing the smoothness has an inverse impact on the performance as measured by DSC. Gaussian kernel of 1 mm width and curvature regularizer weight of 0.1 lead to the highest average DSC.

GS4 was the best overall optimization strategy. The segmentation accuracy of the lungs was significantly higher than the other structures (DSC = 0.95 ± 0.02,95% HD = 5.29 ± 3.25). The segmentation accuracy of the esophagus and spinal cord were relatively low (spinal cord: DSC = 0.8 ±

0.08, 95% HD = 12.34 ± 13.9; esophagus: DSC = 0.59 ± 0.08, 95% HD = 9.47 ± 6.57) due to poor soft-tissue contrast. Figure 4.2 shows the average over the 24 test cases (a) DSC and (b) 95% HD achieved by the best performing strategy (GS4) for all the OARs and the average over the OARs in the form of box plots, indicating confidence interval of [5,95] % and their corresponding outliers. Figure 4.3 shows the segmentation performance for the best performing strategy. Rows 1-5 show the 5th, 25th, 50th, 75th, and 95th quartiles of the median distribution, with the 5th quartile being the worst and the 95th quartile being the best segmentation. The 5th quartile example produces a poor segmentation due to the presence of a tumor in the left lung.

Average DSC over the five OARs achieved by varying (a) width of the Gaussian kernel and (b) curvature regularizer weight for the 24 test cases

FIGURE 4.1 Average DSC over the five OARs achieved by varying (a) width of the Gaussian kernel and (b) curvature regularizer weight for the 24 test cases.

(a) DSC and (b) 95 % HD achieved by the best performing strategy for the five OARs and their average over the 24 test cases

FIGURE 4.2 (a) DSC and (b) 95 % HD achieved by the best performing strategy for the five OARs and their average over the 24 test cases.

Segmentations generated using the best performing strategy. Rows 1-5 show the 5th, 25th, 50th, 75th, and 95th percentile images, from worst to best, from the 24 test cases

FIGURE 4.3 Segmentations generated using the best performing strategy. Rows 1-5 show the 5th, 25th, 50th, 75th, and 95th percentile images, from worst to best, from the 24 test cases.

Summary

Identifying the optimal registration and voting parameters is a challenging exercise. For practical purposes, numerous algorithm details such as atlas selection and label fusion parameters must be held constant during testing. However, one must consider that interplay could exist between these algorithm settings. For example, a registration with high regularization may require a different number of atlases than a registration with low regularization.

In general, both B-spline and demons registrations performed similarly. Algorithms were not very sensitive to the voxel sampling rate, which means that faster registrations at higher sampling rates can be considered. An intermediate schedule with final grid spacing of 20 mm was preferred for B-spline grid spacing. However, final segmentation results were not found to be highly sensitive to these parameters, and average Dice similarity varied only by a few percent over fairly broad parameter setting ranges. However, both algorithms were found to be more affected by the choice of regularizer parameters, with smaller regularization penalty terms being preferred for both B-spline and demons registrations.

References

  • 1. Alven, J, et at. (2016). “Uberatlas: fast and robust registration for multi-atlas segmentation.” 80:249-255.
  • 2. Bai. J, et at. (2012). “Atlas-based automatic mouse brain image segmentation revisited: model complexity vs. image registration.” 30(6): 789-798.
  • 3. Datteri, R, et al. (2011). “Estimation of registration accuracy applied to multi-atlas segmentation.” MICCAI Workshop on Multi-Allas Labeling and Statistical Fusion.
  • 4. Doshi, J, et al. (2016). “MUSE: multi-atlas region segmentation utilizing ensembles of registration algorithms and parameters, and locally optimal atlas selection.” 127: 186-195.
  • 5. Rueckert, D, et al. (1999). “Nonrigid registration using free-form deformations: application to breast MR images.” 18(8): 712-721.
  • 6. Heckemann, RA, et al. (2010). “Improving intersubject image registration using tissue-class information benefits robustness and accuracy of multi-atlas based anatomical segmentation.” 51(1): 221-227.
  • 7. Lotjonen, JM, et al. (2010). “Fast and robust multi-atlas segmentation of brain magnetic resonance images.” 49(3): 2352-2365.
  • 8. Sjoberg, C, et al. (2013). “Multi-atlas based segmentation using probabilistic label fusion with adaptive weighting of image similarity measures.” 110(3): 308-319.
  • 9. Yeo, ВТ, et al. (2008). “Effects of registration regularization and atlas sharpness on segmentation accuracy.” 12(5): 603-615.
  • 10. Zaffino, P, et al. (2016). “Plastimatch MABS, an open source tool for automatic image segmentation.” 43(9): 5155-5160.
  • 11. Shah, KD, et al. (2020). “A generalized framework for analytic regularization of uniform cubic B-spline displacement fields.” arXiv:2010.02400
  • 12. Shackleford, JA, et al. (2012). “Analytic regularization of uniform cubic B-spline deformation fields.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer.
  • 13. Shackleford, JA, et al. (2010). “On developing B-spline registration algorithms for multi-core processors." 55(21): 6329.
  • 14. Joshi, S, et al. (2004). “Unbiased diffeomorphic atlas construction for computational anatomy.” 23(Supplement 1): S151-S160.
  • 15. Avants. BB. et al. (2008). “Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain.” 12(1): 26-41.
  • 16. Vercauteren, T, et al. (2009). “Diffeomorphic demons: efficient non-parametric image registration.” 45(1): S61-S72.
  • 17. Thirion, J-P (1998). “Image matching as a diffusion process: an analogy with Maxwell’s demons.” Medical Image Analysis 2(3): 243-260.
  • 18. Yang, J, et al. (2018). “Autosegmentation for thoracic radiation treatment planning: a grand challenge at AAPM 2017." 45(10): 4568-4581.
 
Source
< Prev   CONTENTS   Source   Next >