Applicability of Molecular Similarity Measures
Firstly, each molecular similarity measure involves an explicit or, more often, implicit specification of its applicability domain - a set of structures (in other words, a region in chemical space) and a set of problems for which its application is possible and meaningful. If a similarity measure is used beyond its applicability domain, the results are usually not very predictable or relevant. It is probably not possible and not necessary to define some ‘true’ or ‘universal’ similarity measure. In fact, chemical structures (and compounds) are complex objects that have many different facets. In different situations, we may be focused on similarities and differences in scaffolds, substituents, and functional groups or in low-level patterns of atoms and bonds, in physical or physico-chemical properties, in chemical reactivity, in interactions with specific biological targets or in overall physiological effects. Each of these facets is objective, but any similarity measure reflecting them would also involve some elements of subjective perceptions and cognitive processes. Thus, a similarity measure should be selected (or constructed) in such a way as to properly capture the features of a structure that are important for a specific problem. In addition, some kind of testing or validation5 is desirable in order to confirm that the important features are indeed captured.