Chemical Spaces and Activity Landscapes
As noted in Section 6.1, the similarity property principle stating that similar structures (should) possess similar properties5’6 plays a very important role in many areas of chemistry, pharmacology, and toxicology. However, despite its usefulness, this principle is a generalization rather than a fundamental law of nature. Thus, it has limitations and exceptions that may be caused by various factors. Recognition and analysis of this fact led to the introduction of the activity landscape (or, in general, property landscape) metaphor.47,4S (The term structure-activity landscape is also commonly used.) It represents the relationship between the structures and activity values as a hypersurface over chemical space in the same way as the earth surface in a real landscape defines the relationship between the geographical locations and altitude values.49,5° Using suitable dimensionality reduction and visualization techniques, this hypersurface can be represented graphically in 2D or 3D for convenient human perception.24’47 Similar to real terrain, the activity landscape is far from being uniform. In some regions of chemical space it can resemble flat prairies or gently rolling hills where the structure-to-activity function is smooth or continuous and the similarity property principle is obeyed. However, in other regions this function can be steep or discontinuous, resembling rugged gorges or peaks. Such discontinuities have been termed activity cliffs, defined as pairs of structurally similar compounds having a large difference in activity.49’51’52
Figure 6.2 Calculation of some similarity measures for the cytidine (A) and lamivudine (B) molecules (Scheme 6.1). (a) Tanimoto similarity based on the molecular fingerprint representation implemented in ChemAxon Instant JChem.19,43 Some of the substructural patterns (linear paths, branches, cycles) found in these molecules are shown on the left, the fragments of the 512-bit hashed fingerprint vectors are shown on the
Scheme 6.1 Structures of the compounds cytidine and lamivudine for the example of similarity calculations.
Among the most striking examples of activity cliffs is the so-called magic methyl effect, i.e. large changes in activity due to the introduction of only one (or few) methyl groups.53 In some cases, the increase in potency (commonly attributed to modified conformational behavior, shape complementarity or solvation) can be very significant, up to 2 or 3 orders of magnitude. in Scheme
6.2 this is illustrated by the orexin receptor antagonists54 3 and 4, as well as by the histone methyltransferase EZH2 inhibitors55 5 and 6. it should be noted that usually additional methyls lead to more modest potency boosts and can just as likely cause a decrease in activity.53 in other cases, their introduction
right. The bits set for both molecules are highlighted in magenta, the bits set only for molecules A or B are highlighted in red and blue, respectively. The size quantities (bit counts) and the similarity function value are shown in the box. (b) Tanimoto similarity based on the molecular graph representation. The intersection of the molecular graphs (maximum common edge substructure) is shown in magenta, the fragments present only in molecules A or B are shown in red and blue, respectively. The size quantities (edge counts) and the similarity function value are shown in the box. (c) Tanimoto and Carbo similarity functions based on the vector representation of atomic charges on the topological (2D) structure level using the Molecular Field Topology Analysis (MFTA).30,31,44 The molecular supergraph and the Gasteiger charge values for the two molecules are shown. Atoms in the supergraph are colored according to the sign of the charge values: red for positive, blue for negative, and white for values close to zero. The size quantities (vector dot products) and the similarity function values are shown in the box. (d) Tanimoto and Carbo similarity functions based on the vector representation of calculated 1D (global) molecular descriptors and predicted properties. The parameter values obtained from the PubChem Compound Database for cytidine (CID 6175)45 and lamivudine (CID 73339)46 are shown in the table. (MW: molecular weight, g mol-1; HAtoms: heavy atom count; Charge: formal charge; X Log P3: calculated logarithm of octa- nol-water partition coefficient; TPSA: topological polar surface area, A2; HBD: hydrogen bond donor count; HBA: hydrogen bond acceptor count; RotB: rotatable bond count). The size quantities (vector dot products) and the similarity function values are shown in the box.
Scheme 6.2 Examples of the activity cliffs caused by methyl substitution.
can lead to changes in selectivity profile, activity type (e.g. negative and positive allosteric modulators of mGluR5 metabotropic glutamate receptor56 7-9), metabolism, or solubility.53 The fluorine substitution can produce even more versatile effects mediated by its steric similarity to hydrogen and by the changes in electronic effects, conformational behavior, physico-chemical properties, metabolic stability, or reaction mechanisms.57’58 For example, norepinephrine 10 and its fluorinated analogs 11 and 12 (Scheme 6.3) have different selectivity profiles towards a- and p-adrenergic receptors,57^9 probably due to the stabilization of different conformers by the OH---F hydrogen bond. 5-Fluorouracil 13 is a mechanism-based suicide inhibitor of thymi- dylate synthase. Similar to uracil, it is incorporated into the intermediate covalently bound to the enzyme, but cannot eliminate the oxidized cofactor dihydrofolate due to the lack of the C5 proton.57 The fluorinated derivatives of drugs are also commonly used to control their metabolism. In 7-fluoro- prostacyclin 14, the electron-withdrawing fluorine atom helps to minimize the enol ether hydrolysis, increasing its half-life to >1 month, compared to 10 min for the parent compound.57 Substitution of fluorine for hydrogen at metabolically labile sites prevents their oxidation by the cytochrome P450 (e.g. in 4'-fluoroflurbiprophen 15).60
The activity cliffs usually have both ‘good’ and ‘bad’ sides.49 The most significant problems they cause occur during the modeling of the structure-activity relationships.48 When the property landscape is steep or discontinuous,
Scheme 6.3 Examples of the activity cliffs caused by fluorine substitution.
the approaches used in this field usually have poor performance or require models of much higher complexity, resulting in significant prediction errors for cliff and other compounds.23 The presence of activity cliffs also complicates the identification of active hit compounds during the screening of compound libraries. only a diverse subset of limited size is usually tested to save the resources, and the compounds with small structural differences (such as methyl substituent in compounds 5 and 6) may easily be missed.55 In contrast, the activity cliffs (i.e. structures with much higher activity than might be expected based on other known data) open interesting possibilities for further discovery and optimization of promising compounds as well as for better understanding of their mechanism of action and modeling of the structure-activity relationships.49’51’61’62 Although the concept of activity cliffs is widely accepted, the detailed criteria for their recognition can be debated and numerous approaches have been proposed. One of them involves the calculation of the Structure-Activity Landscape Index (SALI) for each pair of compounds.636 Somewhat reminiscent of the estimated gradient, it is defined as the absolute difference in activity divided by the similarity-based distance between molecules (eqn (6.23)), usually normalized relative to the largest SALI value in the data set. If this parameter exceeds a specified threshold, the pair is considered as an activity cliff. In addition, it is possible to analyze the cliff relationships in the data set as a whole (represented by the so-called SALI networks) and estimate how well these relationships are predicted by different models.646 The SALI parameters can also be used to estimate the statistical significance of the activity cliffs detected from a particular molecular representation.66 An alternative approach to the classification of the structure-activity relationships and landscapes (continuous, discontinuous, and heterogeneous) is based on the structure-activity relationship index.67
136 Chapter 6
The SALI and similar parameters are quite useful for the analysis of activity cliffs. However, they also have certain disadvantages. Firstly, they are based on the actual numerical values of activity and similarity measures. however, as shown in Section 6.2.1 above, there is no single ‘true’ measure of molecular similarity (or molecular description in general). Different descriptions and similarity measures focusing on different facets of a structure may be more or less relevant to different properties (in particular, structural characteristics critical for a property or activity of interest may be missing from the description or masked by other, insignificant features). Thus, compound pairs classified as activity cliffs using one combination of molecular representation, similarity function and activity measure may not be detected as such using some other combinations. In addition, the calculated similarity values (especially based on the molecular fingerprint representations) are not always intuitive and may be difficult to interpret from a chemical perspective.68 This has even lead to a (somewhat provocative) question of whether all activity cliffs might actually be the artifacts of using inadequate molecular description.49^9 Indeed, in many cases we can hope to select and/or optimize the description and modeling techniques (using available information on the targets and mechanisms of action, previous experience, educated guess, trial-and-error, and/or automated learning techniques) in order to better capture the relevant structural features and correctly predict the activity cliffs that were mispredicted using other approaches.52,7°-73
Additional problems with the SALI parameter are caused by its relative scale and the singularity at S = 1. As a result, it can distort the actual magnitude of the activity differences, leading to the detection of irrelevant (minor) cliffs. For compounds having the same representation the SALI values are infinite, and for highly similar compounds they are extremely sensitive to small variations in the similarity value which are not really significant (see Section 6.2.1).49,M Thus, the discrete activity cliff criteria were proposed that are directly based on the general definition.13,49,51,a A pair of compounds should be considered as activity cliff if it meets the following conditions (which can of course be adjusted to a particular problem but need to be clearly specified):
- (1) both compounds are active, their potency is characterized by K (preferably) or IC5° values and at least one compound has potency in the nanomolar range. (Alternatively, confirmed inactive compounds may also be considered);74
- (2) the compounds satisfy a pre-established similarity criterion; and
- (3) the potency of compounds differs by at least 2 orders of magnitude (2 logarithmic units).
These conditions allow us to define the similarity criterion explicitly, taking into account the features of a problem and the desired level of interpret- ability.13’49’61’75 one option is to require the value of the fingerprint-based Tanimoto similarity (or other similarity measure) above a certain threshold, but limits on the structural modification or even on the similarity of 3D structures or binding modes can also be specified.76’77 The so-called MMP cliffs68’75 (inspired by the matched molecular pair concept, see Section 6.3.2) proved to be especially useful’ easily interpretable’ and chemically intuitive. In this approach’ the two structures are considered similar if they differ only in one substructural fragment (terminal substituent or central core)’ defining a structural transformation. The size of the transformations is restricted to keep them chemically acceptable: the transformed fragments cannot exceed 13 non-hydrogen atoms and must be at least two-fold smaller than the unchanged part of the molecule’ and the difference in fragment size cannot exceed eight non-hydrogen atoms. These limits were chosen to allow the addition of a substituted six-membered ring (e.g. a phenolic substituent) or the replacement of a five- or six-membered ring by a substituted condensed two-ring systems containing up to 10 ring atoms. (If the activity difference in a matched molecular pair is small’ the transformation corresponds to a bioisosteric replacement.)
The distribution of the activity cliffs was thoroughly analyzed61’68’75 using the MMP and fingerprint/Tanimoto similarity criteria and the high-confidence activity data from the ChEMBL78 database. The results indicate that the activity cliffs are in fact fairly common’ rather than rare and exceptional. For all analyzed targets the proportion of the cliff-forming compounds determined from the different fingerprint similarity measures ranges from 34% to 41%. Based on the MMP similarity’ it was lower (close to 28%)’ but approximately one out of three compounds was still involved in one or more activity cliffs. Interestingly’ the consistency between different measures was rather poor (only ~15% of the compounds are recognized as cliff-forming by all fingerprint criteria and ~11% by all fingerprint and MMP-based criteria). Thus’ the application of structurally conservative and chemically interpretable MMP similarity measures seems preferable.61 For different targets some variability in the activity cliff distributions is present’ but the differences are not substantial. Based on the MMP similarity’ the proportion of cliff-forming compounds ranges from 20% to 38% and the proportion of the cliffs (relative to all compound pairs meeting the similarity criteria) ranges from 5% to 15%.75 It should be noted that most of the activity cliffs (>95%) are not isolated’ but form disjoint clusters of various topologies that may include up to dozens of compounds (so-called coordinated cliffs).79 Some techniques to extract the structure-activity relationship information from the coordinated cliff networks have been proposed.52’80 This approach seems quite promising’ since historical analysis shows that the compound optimization paths originating from the cliff compounds are much more likely to reach the most potent compounds in the series’52’61’81 but these possibilities are severely underutilized in the practice of medicinal chemistry.61’82