Assessment Based on Intended Clinical Use

The end purpose of auto-contouring in radiation oncology is that those auto-contours are used in the clinic for treatment planning. To this end, the best form of assessment is to understand the impact that their use would have on clinical practice. Ideally, one would study the impact on patient outcomes since this is the endpoint for assessment of any intervention. It may be argued that such an assessment would be pointless anyway since auto-contours should be checked and approved before use in treatment planning. However, it is known that having an auto-contour as a starting point improves consistency between manual observers [32], therefore auto-contouring will be having some impact on the treatment plan and thus the patient outcome. However, contouring is very early in the treatment planning processes, and other steps are likely to obscure any meaningful observation of differences in patient outcome that can be attributed only to the difference in contouring methodology.

Evaluation of Time Saving

If contours are to be edited following auto-contouring, as some have deemed inevitable [33], then the benefit of auto-contouring is in reducing the contouring time required as far as possible. Thus, it follows that evaluating the time saving to get to an acceptable clinical contour is a good way to quantify the clinical impact of using auto-contouring. Correspondingly, this is the most common approach used for clinical benefit assessment, as observed in Table 15.1.

Most studies follow the same pattern; the structures for a set of patients are first manually contoured, with the time taken for manual contouring being recorded. Subsequent to this, an auto- contouring method is applied to the same set of patients, and the auto-contours are edited to an acceptable standard, with the time taken for this step also recorded.

Challenges for Study Design

To evaluate contouring time inherently takes time. As mentioned previously, the clinicians time is valuable, therefore, to ask them to contour the same cases twice only for evaluation purposes represents a barrier to performing a large assessment. Consequently, the size of study tends to be small, with the largest study listed in Table 15.1 only considering evaluation on 27 patients [34].

Since the sample sizes are small, all investigations have used the same set of patients for manual contouring and editing of auto-contours. This introduces a risk that the observers become familiar with the case, and thus familiarity influences the time required for contouring or editing. Steps can be taken to mitigate this risk, such as randomizing the order in which contouring is performed between cases or ensuring a large gap between de novo contouring and editing [35]. The scenario of repeated contouring is artificial and introduces the risk that the time recorded is not reflective of clinical practice - since observers know that they are being monitored for this evaluation. This could influence the results (either positively or negative depending on the observers general feeling towards auto-contouring). Mitigation of this effect can be achieved to some extent by evaluating how similar contours are following manual contouring and after-editing, with comparison to interobserver variation.

A better approach to overcome these risks could be to perform evaluation on a large number of consecutive patients, studying the time taken for contouring within the clinical workflow, before and after introducing an auto-contouring system. While the patient cases assessed will be different, and the structures required to be contoured for each patient will vary according to their treatment, such timing will reflect an average clinical workflow. Such a study does not appear to have been performed to date. This perhaps is unsurprising as it is preferable to conduct a small cohort study for commissioning a system prior to clinical use, to avoid disrupting clinical workflow with an unproven system. Nevertheless, such an impact analysis study would be very beneficial in demonstrating the true clinical impact of auto-contouring.

A further consideration relates to what is being compared. Some studies (e.g. [7, 32]) compared to manual contouring, while other investigators have considered the impact to their current clinical practice which may already include a different form of auto-contouring (e.g. [35, 36]) or semi-automatic tools (e.g. [30]). The study suggested above would also reflect the impact of auto-contouring in clinical practice, rather than an artificial use of manual contouring. However, changing existing clinical practice will also influence the time taken. Where clinical staff are familiar with using a set of tools for manual clinical contouring, editing of auto-contouring may be best achieved with a different set of tools. Such a change in the suitability of the contouring/editing tools should be expected, and alongside that it would be anticipated that there would be a learning curve. Thus, the impact on clinical practice would need to be assessed after a period of adoption.

Impact of Auto-Contouring on Planning

Beyond contouring, the contours themselves are an input to treatment planning. Thus, to get an indication of whether the use of auto-contouring is impacting the patient, it is necessary to look at the dosimetric impact in planning. To discuss what has been done more easily, what could be done will be considered first. Figure 15.12 illustrates the combinations of assessments that could be calculated. Three possible contouring approaches can be considered: manual clinical contouring, unedited auto-contouring, and editing auto-contouring. A treatment plan can be created from each one of these contour sets, assuming these were the contours available to the planner. These plans will be referred to as A, B, and C. However, if one assumes that the true structure boundary could be any one of the contour sets, it is possible to calculate the DVH from any plan with any contour set, leading to nine possible DVHs. These combinations will be referred to in the form Al, A2, etc. where Al would represent the DVH assessment of the plan made using contour set 1 with the contours of set 1, A2 would be the DVH of the plan created using the contour set 1 evaluated with the contour set 2.

Voet et al. [37] asked the question as to whether auto-contours need editing by first evaluating B3 for the elective nodal volumes in the head and neck. B3 shows the impact of dose to structures should a plan be generated with unedited contours, on the assumption that the adjusted auto-contours are correct and represent the true anatomy. Therefore, this could be used to assess potential underdosage to the Planning Treatment Volume (PTV) resulting from not editing contours. Subsequently

Possible investigations into the impact on treatment planning

FIGURE 15.12 Possible investigations into the impact on treatment planning. Three possible contour sets could lead to three different treatment plans. The dose to structures can be investigated assuming each contour set is correct, leading to nine possible DVH curves. The most common approach so far is to compare the dose from Plan A using contour sets l and 3.

they compared C2 to СЗ. C3 represents the DVH for OAR doses that would be performed during clinical planning using edited auto-contours, whereas C2 gives the DVH that would be found if the contours were not edited. However, this comparison assumes that the OAR dose is not critical to the plan generated. Thus, they suggest that auto-contours of OARs only need editing if they are approaching a critical planning dose.

Yang et al. [10] performed a comparison using A2 and A3 to assess the impact of editing on the reported OAR dose. Similarly, Van Dijk et al. [35] used A2 and A3 to study difference in the impact on OAR dose of two different auto-contouring approaches. Kieselmann et al. [38] compared the OAR dose using B1 and B2 for head and neck contouring on magnetic resonance imaging (MRI). However, the PTV was manually drawn. Thus, while the dose was optimized on unedited OAR and the DVH compared with those using the clinical contours, only clinical PTV were used in planning. Kaderka et al. [39] similarly compared OAR dose, using Al and A2, for breast OAR contouring.

For target contouring, Dipasquale et al. [34] made a target dose assessment using C3 to Cl for the whole breast PTV. Here, the aim was to assess whether the planning on edited contours would provide appropriate dose coverage, assuming manually drawn contours represent the true anatomy. In contrast, Simoes et al. [40] compared dose using B3 and C3 to suggest that planning on unedited contours was acceptable for the whole breast PTV.

In all cases, comparison between different doses was done measuring the difference in selected DVH parameters, typically those used for planning either as OAR constraint parameters, such as V20Gy for lungs, or as PTV coverage measures, such as the V95%.

Challenges for Study Design

As has been noted there are nine combinations of plans and contours sets that could be used to assess the dosimetric impact of planning. Table 15.3 summarizes these possibilities, showing that all the available options have been used in at least one research study.

TABLE 15.3

Use of Plan and Contour Combinations in Studies of the Impact of Auto-Contouring


Dose set plan on

1. Manual

2. Automatic

3. Adjusted Automatic

A. Manual

[39, 69]

[10. 35, 39]

[10, 35]

B. Automatic

[37, 38]

[38. 691


C. Adjusted Automatic



[34, 37,401

While there appears to be a broad choice of contour/plan combinations, the purpose of evaluation is similar for all studies; can auto-contours be used clinically, and do they require editing? Thus, most studies either hold the plan constant and vary the contours for assessment, or use the same contour set for evaluation but vary the plan. There is an underlying assumption in these assessments that the clinical contours, whether manual (1) or adjusted (3), and the plans based on these (A or C) represent the reference contours and the “correct” plan. Therefore, the choice becomes contour set 1 or 3 vs 2 on a constant plan, or plan A or C vs В on a constant contour set.

Only two studies fall outside of this. Dipasquale et al. [34] seeking to show that the plans on edited auto-contours are equivalent to those on manual contours - not considering the possibility that the contours would be use unedited. Voet et al. [37] evaluates B3 in isolation of a comparator for target volumes, since the plan has been optimized assuming the unedited contours are correct, and the “true” contours can be used to assess what the under-dosage would be.

This assumption that the clinical contour is truth, raises the same concern as that of quantitative assessment: inter-observer variability. However, in addition to inter-observer variability in the contours, the potential for inter-observer variability in the planning may come into play. Several studies take steps to address this through the use of auto-planning systems to produce plans from human subjectivity [38, 40]. The same studies seek to measure the impact of inter-observer variation in contours on the planning, so as to place the impact of the auto-contouring into context.

While the need to edit contours and perform planning generally keeps studies small (<30 cases). The use of auto-planning reduces the demand for human effort, thus facilitating slightly larger studies, with Simoes et al. [40] performing an evaluation on 87 patient cases.

Summary of Clinical Impact Evaluation

The main purpose of auto-contouring is to reduce the burden of contouring. Therefore, to really evaluate the clinical impact of auto-contouring, it is natural to measure the time saving by using it. In this chapter, it has been considered that there are a number of challenges with implementing such experiments in practice - particularly with regard to inter-observer variability. Ultimately, auto-contouring would be of such a standard as to not require editing. Yet, human nature appears to be to correct w'ithin the margins of inter-observer variation, as was evidenced by subjective tests where blinded observers w'ould edit original clinical contours. Therefore, a better clinical test of auto-contouring, or human editing, is to evaluate the dosimetric impact that editing, or the absence of editing, will have on the patient. An alternative approach would be to introduce clinical contours from another observer into the assessment blindly. This w'ould demonstrate the potential impact of an auto-contouring method at human expert level. Nevertheless, all clinical impact studies are time consuming and challenging to implement for larger numbers of cases, and therefore might be best reserved for clinical commissioning rather than during algorithm development.

< Prev   CONTENTS   Source   Next >