High-Dimensional Biomarkers in Drug Discovery: The QSTAR Framework
Introduction: From a Single Trial to a HighDimensional Setting
In contrast with the analysis presented in previous chapters, which was focused on data obtained from clinical trials, this chapter focuses on drug discovery experiments. Our aim is to find genetic biomarkers for phenotypic data for a set of compounds under development. The data for the analysis consists of (1) a m x n gene expression matrix (X) that contains gene expression measurements of m genes for n compounds, (2) a n x 1 vector of phenotypic data (Y), and (3) a n x 1 vector of chemical structure (Z). Figure 16.1 illustrates the relationship between the three variables. Our goal is to model the relationship between the gene expression and the phenotypic data, taking into account that the chemical structure of the compound may (or may not) influence both variables. This modeling approach is called QSTAR, Quantitative Structure- Transcription-Assay Relationship, and it is further discussed in Section 16.2. The connection between the QSTAR framework and the surrogacy framework is illustrated in Section 16.4.
Although the experimental setting that we consider in this chapter is different from the clinical trials setting, the experimental unit, per gene, (Xj,Yj,Zj), is similar to the single-trial setting discussed in Chapter 4. The main difference is that, in contrast to a single-trial setting where we have one dataset with one surrogate and one true endpoint, in the current setting, we have a high-dimensional dataset in which there are m candidates (genes) from which we need to select biomarkers for the phenotypic data. Due to the fact that, for a specific gene, the observation unit and the association structure between the three variables are similar to the single trial setting, in this chapter, we proposed to use a joint model for transcriptomic and phenotypic data, conditioned on the chemical structure, as a fundamental modeling tool for data integration within the QSTAR modeling framework. We elaborate on the high dimensionality of the data in Section 16.3.
The joint modeling approach can be used to uncover, for a given set of compounds, the association between gene expression and biological activity (i.e., the phenotypic variable, for example, the half maximal inhibitory concentration, IC50, the negative 10-logarithm of this quantity, pIC50, etc.) taking into account the influence of the chemical structure of the compound on both variables. The model allows us to detect genes that are associated with the bioactivity data, hence facilitating the identification of potential genomic biomarkers for the compound’s efficacy. In addition, the effect of every
Relationship between gene expression (for a specific gene Xij, i = 1 j = 1 chemical structures (FF-fingerprint feature), and phenotypic
data (i.ebioactivity data) within the QSTAR framework. The relationship and association structure are similar to the single-trial setting discussed in Chapter f. In surrogacy terminology, Xj represents the “surrogate” endpoint, Yi represents the “true” endpoint, and Zi represents the treatment variable.
chemical structural feature on both gene expression and pIC50 and their associations can be simultaneously investigated. The joint model is presented in Section 16.5.
Biomarker identification is a major application of microarray experiments in early drug development, which often parallels and facilitates compound selection. Many studies have been devoted to identifying genes that are correlated to a biological activity of interest, for instance, the inhibition of a certain enzyme. It is also equally important to detect toxicity at the early stages of development. Reliable biomarkers for toxicity can be very helpful in this respect as it allows cost-effective testing of other drug candidates in compound series under investigation. For example, Lin et al. (2010) and Tilahun et al. (2010) identified gene-specific biomarkers for continuous outcomes (the distance traveled by rats under treatment and the Hamilton Depression (HAMD) scores for psychiatric patients). Van Sanden et al. (2012) identified gene-specific biomarkers for toxicity data presented as a binary response. The joint modeling framework (Lin et al., 2010; Tilahun et al., 2010) that we present in this chapter allows us to: (1) identify gene signatures of activity for directing chemistry, (2) determine chemical substructures (called fingerprint features, FF) of compounds that are associated with the bioassay data for a biological target(s) of interest, and (3) investigate whether the association between the compounds and the bioassay can be confirmed by the gene expression changes (either on- or off-target related).
Identifying relevant genes that are associated with biological response already provides valuable information, but showing that this association is caused by the presence or absence of a particular chemical substructure(s) provides additional information that is particularly useful in drug design to improve or prioritize compounds. The methods discussed in this chapter are applied to two case studies presented in Section 16.6.