THE USE OF RWE TO SUPPORT LICENSING AND LABEL ENHANCEMENT
Regulatory agencies rely mostly on data from randomized controlled trials (RCTs) to make major decision relating to the safety and efficacy of alternative treatment options. As pointed out in earlier sections, this is in part due to the internal validity of RCTs, relative to non-randomized studies. However, there are situations where RCTs may not be appropriate for operational or ethical reasons. Under such circumstances, it may be necessary to use information from observational studies.
Real-world data (RWD) has been defined as “data relating to patient health status and/or the delivery of health care routinely collected from a variety of sources,” while real-world evidence (RWE) pertains to the “evidence about the usage and potential benefits or risks of a medical product derived from analysis of RWD” (US FDA 2019b). Historically, RWE from observational studies has been typically used by regulators for post-approval safety monitoring and regulatory decisions. Healthcare providers employ RWE in the assessment of benefits and risks from pharmacoeconomic perspectivesin order to support coverage decisions and guidelines for use in clinical practice. More recently, pharmaceutical companies have been using RWD to generate RWE to support additional efficacy or safety labeling for the therapeutic product label. Thus, with progress in health information technology and modern analytics, RWE can now be used to address important regulatory questions and to demonstrate the value of medical products. As a result, regulatory agencies have begun formulating programs to promote the application of RWE. A case in point is the 21st Century Cures Act of 2016 intended to establish a framework for use of RWE in regulatory decision-making in the US (Public Law No: 114-255 (December 13, 2016)). In addition, FDA has issued important guidance on the topic of RWD and RWE (see, e.g., FDA 2018b and FDA 2018c)
On the other hand, there are many limitations of observational studies, especially in the context of regulatory use. Although randomization can be employed in real-world settings (Sherman et al. 2016), observational studies are non-interventional and hence randomization is absent. One inherent major problem of nonrandom assignment of subjects to treatment is the likely bias in the assessment of treatment comparisons. Biases arise from the lack of comparability among treatment groups with respect to known and unknown confounding factors (Deeks et al. 2003). Other shortcomings include data quality, accessibility of data sources, and the protection of privacy and confidentiality of patients (Alemayehu and Mardekian 2011). Recent studies also suggest that results of observational studies tend to depend on trial design, data source, and analytical procedures (Madigan et al. 2013a). This goes to the heart of the generalizability of results of observational studies. Bartlett et al. (2019) presented research results on the feasibility of RWD to replicate RCTs using US-based clinical trials published in high-impact journals in 2017. Their study, which had certain limitations, reported that only 15% of the clinical trials could be replicated using currently available RWD.
As a result, there has been a growing body of literature on approaches to maximize the value of real-world evidence (RWE) in healthcare decision-making, both from the methodological as well as the operational perspectives (Berger et al. 2014; Rosenbaum and Rubin 1983; Waning and Montagne 2001). Unsurprisingly, regulatory agencies are also in the process of evaluating the potential of RWE in drug development and approval (FDA 2018c; Sutter 2016).
In this chapter, we summarize some of the statistical and regulatory issues with the use of RWE in drug development, with particular reference to recent developments in the US and other regions. A brief outline is provided regarding the common approaches used in the design analysis and reporting of observational studies. In addition, selected examples are provided to highlight the current regulatory viewpoints pertaining to the role of RWE in drug development.
Methodological and Operational Considerations
In the literature, a confounder is defined as a variable that is associated both with the response and the treatments under study. In RCTs, randomization ensures comparability of treatment groups with respect to observed as well as latent confounders (Collins and Lanza 2009). In the absence of randomization, it is generally impossible to eliminate the impacts of all potential confounders. In some cases, such as confounding by indication, which is common in drug safety studies where the indication is also a risk factor for the outcome, the associated bias cannot be completely removed by design or modeling, when no control exists for the underlying condition (Bosco et al. 2010; Psaty and Siscovick 2010). Therefore, best practices should be used in the design and analysis of data from observational studies, and caution should be exercised in the interpretation of the accompanying results.
Alternative design options are available for observational studies. Prospective cohort studies are often used to compare treatment regimens based on subjects that use the drug of interest and others that use a suitably chosen comparator, both prospectively identified with respect to predefined criteria. The subjects are then followed over time and the outcome of interest compared in the two groups, using models that adjust for relevant confounders. Such designs tend to be resource-intensive and generally require lengthy time for data collection, particularly for rare events. In some cases, retrospective cohort studies, which are relatively less costly, may be executed; however, such studies may be limited by the availability of data for analysis (Kleinbaum et al. 2013).
Matched case-control designs often prove to be appealing since they are cheaper and less time-consuming than prospective cohort studies. In such designs, subjects having a given outcome (cases) are matched to those without the outcome (controls) according to a prespecified matching criterion. The rates of exposures in the two groups are then compared using analysis methods that take into account the potential correlation introduced by the matching. The selection of a suitable control group is essential to obtaining valid results. Case-control studies tend to suffer from selection bias, and lack of generalizability of the results, since study subjects are selected according to the outcome values (Kleinbaum et al. 2013; Madigan et al. 2013b).
A common approach for handling observed confounders is the use of traditional models, such as the standard linear model, generalized linear models, or a Cox proportional hazards model, in which the confounders are included as covariates. While these procedures have many desirable properties, including ease of interpretation, they can be sensitive to departure from model assumptions. For example, they may lead to misleading results in the presence of multicollinearity or influential points. They may also lead to inefficient estimators when the number of covariates is large relative to the number of observations. Recent approaches that involve regularization, including ridge regression and the lasso (Tibshirani 1996) have been proposed as viable solutions to mitigate some of the issues (see, e.g., Hastie et al. 2009). However, the results based on regression approaches cannot be fully relied upon to address confounding issues.
The propensity-score technique, introduced in Rosenbaum and Rubin (1983), is one of the most widely used methods for handling observed confounders. The underlying principle is based on the concept of counter-factual causality (Heckman 2005). More specifically, given two treatment groups, denoted by Z, having a value of 1 if the subject is exposed, and 0 otherwise, the propensity score (PS) for an individual is defined as the conditional probability of being treated given the co variates:
pi = Pr(Z = 11 covariates for subject i).
The propensity scores are typically estimated using standard logistic-regression models. The estimated individual subject propensity scores can then be used in 1:1 or M:1 matching, grouping subjects with respect to their PS values (D’Agostino 1998). A drawback of matching is a potential loss of observations if there are not suitable matches at the low or high end of the PS. The observations are then trimmed, and the remaining matched subjects are analyzed. As in the case of matching by individual characteristics, the PS matching will introduce correlation into the matched observations that should be taken into account in the analysis.
Alternative approaches are available to perform analyses involving PS matching. In some applications, the estimated propensity score is included as one of the covariates in the model. However, this approach has been shown to give biased estimates (Austin 2009a; Imbens 2004). Another approach concerns stratification, in which subjects are categorized into disjoint subsets based on prespecified PS thresholds. A method proposed by Cochran (1968) consists in dividing subjects into five equal-size groups. In this framework, the analysis may be performed by pooling stratumspecific estimates by weighing the stratum-specific estimates by the inverse of their variances or using standard techniques, such as analysis of variance (ANOVA), logistic regression, or Cox proportional hazards models, with the PS strata included as a stratification term in the model. If each treatment group is not adequately represented in each stratum, the method may suffer from loss of information. Stratifying by PS groups has also been shown to be a good diagnostic method to assess effect modification as well as residual confounding and to elucidate the treatment effect with respect to the original confounding variables (Gaffney and Mardekian, 2009).
A method that involves assigning each subject a weight equal to the inverse of the probability of receiving the treatment the subject actually received is the so-called inverse probability of treatment weighting (IPTW). In the abovementioned notation, the weight for individual i is given by:
Given outcome Y, the average treatment effect 8 is estimated by:
Inference about 8 may be performed using suitable estimates of the standard error of 8 (Lunceford and Davidian 2004). One limitation of the IPTW approach is that weights may be unstable for subjects with small values of the PS (Robins et al. 2000). The rationale for the IPTW analysis is that subjects with a relatively high p, are overrepresented in the treatment group and thus their observations are down weighted while the reverse is true for the control subjects.
The pros and cons of the abovementioned procedures may be found in Austin (2007; 2009b). It is noted that in certain settings PS matching may not be always preferable compared with conventional multivariable methods (Sturmer et al. 2006), and that the performance of the method is in general dependent on the appropriateness of the variables included in the construction of the scores. Further, in the face of high dimensionality and collinearity, the models may not perform adequately (Schneeweiss et al. 2009). For a discussion of use of modern analytic approaches, see, e.g., Setoguchi et al. (2008).
Instrumental variable (IV) techniques are often used to handle unmeasured confounders in observational studies (Newhouse and McClellan 1998). An unmeasured confounder may simply be a confounder such as disease severity at the time of treatment initiation, which is not captured in the RWD or, more subtly, it may be the reasons why the prescriber decides to select one treatment over the alternative for a given subject. That is, clinical judgment often leads to confounding that cannot be controlled for by any measured subject characteristics. To be implemented reliably, an IV must satisfy two conditions: (1) it must be strongly associated with treatment; and 2) it does not have a direct effect on the outcome variable, but only through the treatment variable. An example of an IV construction may be interruptions in medical practice, which may be a consequence of an important innovation. Another involves treatment preference, independent of patient factors. Instances of the latter may include distance to specialists (McClellan et al. 1994); geographic areas (Stukel et al. 2007); physician prescribing preference (Brookhart et al. 2006); and hospital formulary (Schneeweiss et al. 2007). The IV is included in the analysis as a covariate or stratification variable to adjust for unmeasured confounders. However, one can never be certain that unmeasured confounding has been adequately addressed and it remains a limitation to the analysis of observational data. An alternative method to address unmeasured confounding is by a tipping point analysis. In this approach the amount of unmeasured confounding that would be required to change the study conclusion is estimated.
In addition to the methodological issues discussed above, effective use of data from observational studies requires addressing important operational challenges. Since healthcare data may come from different sources, including electronic health records (EHRs) and claims databases, they typically require special provisions for data storage, computing environment, data standards, and protection of privacy and confidentiality (Alemayehu and Mardekian 2011). Depending on the sources, different nomenclatures, coding conventions, and units are often used for medical terms. Since data collection is not performed for the purpose of research, data entry errors are common, often leading to such issues as miss-classification, missing values, and outliers. In addition, most of the available data may be unstructured.
As a result, concerted efforts are required by various stakeholders to establish a framework for the harmonization of healthcare data. Recent activities in this regard include increased use of the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) to code and classify diagnoses from inpatient and outpatient records. The National Drug Code (NDC) scheme, which is maintained by the US Food and Drug Administration (FDA), is another tool for coding prescription drugs and insulin products. In addition, some initiatives are underway in the US to harmonize data collection across states (Porter et al. 2015).