# Sensitivity Analyses

Once an analysis is performed based on a certain set of assumptions about the missing values, the results should be supported using appropriate sensitivity analyses that address the same research hypothesis. A key objective of the sensitivity analyses should be to evaluate how different assumptions influence the initial results that were obtained.

Although most primary analyses are performed under MAR assumptions, in general, it is not possible to rule out MNAR, and therefore the planned sensitivity analysis should include the scenario of MNAR. However, data analysis under MNAR assumptions is complex, and most of the common likelihood-based methods require specification of the joint distribution of the data and the missing data mechanism (Ibrahim and Molenberghs 2009). The approaches generally rely on maximum likelihood methods of estimation, based on mixed-effects models and normally distributed outcomes, and are intended to handle dropouts in clinical trials involving longitudinal data. Examples of the approaches for MNAR include selection models (Diggle and Kenward 1994), pattern-mixture models (Little and Wang 1996) and shared-parameter models (Little 1995; Kenward 1998). Shared-parameter models take into account the dependence between the measurement and the missingness processes, typically using random effects (Wu and Bailey 1989). On the other hand, both selection models and pattern-mixture models involve factoring the joint distribution of the full data and missing mechanism into suitable products of conditional distributions. For example, selection models are based on assumptions about the distribution of outcomes for all subjects and the distribution of missingness indicators conditional on the hypothetical complete outcomes. Pattern-mixture models, which are relatively more transparent and clinically interpretable, involve the conditional distribution of the data given the missingness pattern. In this case, when the number of patterns for the missing data is large relative to the sample size, there may be inadequate data to estimate parameters with reasonable degree of precisions. Thus, pattern-mixture models are not typically applied in situations with arbitrary missingness, but are generally restricted to cases with monotone missingness, where the number of patterns is manageable.

Tipping-point analyses are performed as an alternative approach to assess the robustness of study results corresponding to an assumed missingness mechanism. The approach essentially involves performing analyses with a range of values and searching the tipping point that reverses the study conclusion (e.g., from significant to non-significant). Since tipping-point analyses require exploring alternative model assumptions and values of the parameter, evaluation of the results may be cumbersome. To facilitate the interpretation of results from tipping-point analyses, alternative graphical displays have been proposed (see, e.g., Liublinska and Rubin 2014). The relatively transparent MNAR methods of pattern-mixture models and tipping-point analyses are practical sensitivity models to assess the robustness of the primary MAR results.

# Estimands and Other Recent Regulatory Developments

Recently, the issue of missing values has been addressed in the context of the clinical question being asked and hence the quantity to be estimated, called estimand, and the nature of the sensitivity analyses that need to be performed. This was motivated by the aforementioned National Research

Council (NRC) report that addressed various aspects of missing data in clinical trials (NRC 2010). There have since been subsequent efforts involving diverse stakeholders to formulate a general framework to align trial objectives and planned inference (Akacha et al. 2017; ICH E9 (Rl) 2017; LaVange and Permutt 2016).

The NRC report covered the underlying issues associated with missing data but did not give any recommendation about a specific method for handling them. However, it cautioned against the use of single imputation methods, such as LOCF mentioned earlier. The report emphasized the importance of sensitivity analyses, as well as the necessity of preventive steps that need to be taken at the design and conduct stages of a clinical trial. Some of the suggested measures include implementation of novel designs, other than the usual parallel group design; enhanced patient consent; encouraging patient compliance; and making greater efforts to collect post-discontinuation data.

As highlighted in the NCR report, the objective of an analysis strategy involving data with missing values should be to rule out bias in favor of the experimental drug that may have been introduced as a result of missing information or as a consequence of the action taken to handle the missing values. However, decision about the choice of the methods for the primary as well as sensitivity analyses is often complicated by various factors, including the lack of clarity about the intended objective, the actual target of inference, and the patient population to be included. As a result, this has led to the need to establish a framework based on the concept of estimands. In the following, we provide a high-level overview of the current thinking about estimands, while stressing the fact that the concept is still evolving and there are many issues that need to be addressed for effective implementation of the framework under discussion.

In a broad sense, an estimand is the quantity that is the target of inference in order to address the scientific question of interest posed by the trial objective (ICH E9 (Rl) 2017). As such, it may be characterized by various attributes, including the population of interest, the variable (or endpoint), the handling of intercurrent or post-randomization events, and the summary statistics associated with the outcome variable.

The population of interest typically consists of the set of all study participants as defined by the protocol inclusion and exclusion criteria. This is referred to as the intent-to-treat (ITT) population. Usually, the ITT population is used to address the primary objective of establishing a treatment effect. However, there may be many valid scientific questionseach requiring its own estimand, all of which may be used by regulatory agencies to assess the strength of the study results. In certain cases, the estimand may relate only to a subset of the randomized patients satisfying certain criteria, including any potential intercurrent or post-randomization events. In the literature this is often referred to as the “principal stratum.” For example, the principal stratum may be the set of patients in which failure to adhere to treatment would not occur. In this case, the primary hypothesis relates to the treatment effect in this stratum. The variables used to characterize the estimand may be actual assessments taken during the study or functions of the measurements as well as intercurrent events. Finally, the population-level summary measure or statistic is a key component in the construction of the estimand and forms the basis for treatment effect comparisons.

Since intercurrent or post-randomization events can affect interpretation of results, there should be a clear specification of how they are incorporated in the construction of an estimand. Several strategies have been proposed to address intercurrent events, depending on the therapeutic and experimental contexts. In one approach, called treatment policy strategy, the value for the variable of interest is used without regard to the occurrence of intercurrent events. This approach is in alignment with the principle of ITT. However, in this strategy an estimand cannot be constructed with respect to a variable that cannot be measured after the intercurrent event. An alternative approach is the composite strategy, in which the intercurrent event is taken to be a component of the variable. For example, a responder may be defined in terms of a composite of no use of rescue medication and a favorable clinical outcome. In other situations, the estimand may be defined with respect to a principal stratum, which may be a subset of the study population that did not experience the intercurrent event. In contrast to the usual subgroup analysis, it is noted that principal stratification is defined based on a patients potential post-randomization events. When the design involves repeated measurements, one may only focus on the responses observed prior to the occurrence of the intercurrent event. For example, in this case, also referred to as while-on-treatment strategy, if the goal is to assess treatment effect on a given symptom and a patient dies, one may only consider the effect on symptoms before death. Lastly, the strategy may involve defining a hypothetical scenario in which the intercurrent event would not occur and formulating the scientific question under the putative scenario. For example, in case a rescue medication is permitted in the protocol, the strategy requires assessing the outcome if no rescue medication was provided.

Underpinning the estimand framework is the importance of valid sensitivity analyses to assess the robustness of inferences from the main analysis. As discussed in the previous section, this should involve a number of analyses targeting the impact of deviations from some of the relevant underlying assumptions. The sensitivity analysis, as well as the estimand to which it is aligned, should be prespecified in the trial protocol.

The ongoing effort to develop a viable framework to bring the target of estimation, method of estimation, and sensitivity analysis in consonance with the objective of the trial can certainly lead to a better formulation of research hypothesis and interpretation of results and can facilitate collaboration among diverse stakeholders. In particular, it ensures alignment between pharmaceutical companies and regulatory agencies on expectations about trial design, data collection, and analytical strategies. However, as the concept evolves, further work may be needed to address operational issues that may arise in the course of the implementation of the framework.

# Concluding Remarks

As elucidated in this section, one of the major threats to the validity of evidence from RCTs is the potential for bias associated with missing data. Despite the availability of regulatory guidelines and novel statistical approaches to address the issue, there is no silver bullet to solve the problem. Modern statistical analysis tools rely on untestable assumptions, and often require borrowing auxiliary information from experimental units with complete information. While sensitivity analyses are generally recommended as important tools to assess the degree to which results depend on model assumptions, the appropriateness of the approaches is heavily dependent on the extent of their coherence with the formulation of the original analysis. The only fullproof way to solve the missingness problem is not to have missing data. Although there are several proposed preventive steps that may be taken at the design and conduct stages of the trials to minimize their occurrence, in reality, missing values are unavoidable.

The recent efforts to define a framework in terms of estimands appears to be a step in the right direction, as that might help to enhance the communication between regulatory agencies and pharmaceutical companies by ensuring alignment early on in the process, as well as explicitly define the clinical question being addressed. From the broader perspective, the effort to harmonize the trial objectives with the analytical approaches and regulatory expectations may also contribute toward the overarching goal of improving the efficiency of the drug development paradigm. However, the concept is still evolving, and its refinement and successful implementation would undoubtedly require a gradual and iterative approach, involving inputs by all stakeholders.