The Protein Structure-Function Paradigm
Knowledge of the three-dimensional structure of a protein plays an important role in understanding the molecular mechanisms underlying its function. The three-dimensional structures of proteins can often provide more functional insight than simply knowing the protein's sequence (Fig. 2.2). For example, the structure reveals the overall conformation of the protein along with the biological multimeric state of the protein. It reveals the binding sites, interaction surfaces and the spatial relationships of catalytic residues. Protein-ligand complexes provide details of the nature of the ligand and its precise binding site, which helps in postulating the catalytic mechanism. The PDBsum resource  provides pictorial analyses for every structure in the PDB along with detailed information extracted from various resources such as SwissProt, Catalytic Site Atlas (CSA), Pfam, and CATH, which are beneficial for structure-function studies.
Figure 2.2 From protein structure to function.
Furthermore, ab initio prediction of binding pockets and clefts on the protein structure, using methods such as pvSOAR, CASTp, SURFACE, SiteEngine, and THEMATICScan, also provides useful information about protein function.
Proteins are composed of one or more building blocks called domains which are distinct, compact units of protein structure. These domains often combine in a mosaic manner in multidomain proteins (domain shuffling), generating new or modified functions . Structural similarity between homologous proteins is more highly conserved during evolution than sequence and is therefore helpful in recognizing even distantly related proteins. Domains are considered to have the same fold if they share the same orientation and connectivity of the secondary structures. Specific domains within a protein are often found to have distinct functional roles, but sometimes more than one domain may be involved in a particular function, for example where an enzyme's active site is formed at the interface between two domains.
Evolutionarily related proteins having high fold similarity often share functional similarity. As a result, protein functions can sometimes be inferred by comparing the structure of the query protein with that of an experimentally characterized protein. Structural relationships can be captured by using various well-established algorithms, for example DALI , SSAP , STRUCTAL , CE , MAMMOTH , FATCAT  and CATHEDRAL .
Protein structure classification databases such as CATH and SCOP have enabled detailed analysis of structure-function relationships between evolutionarily related proteins. CATH and SCOP extract structural information from the PDB and classify domains into different classes, folds, and homologous superfamilies in a hierarchical manner based on their structural relationships and evolutionary origin. Studies based on these resources have shown that the structure-function relationship of proteins is very complex and fold similarity may not always be sufficient to conclude functional similarity . For example, some folds such as the Rossmann fold and TIM barrels can carry out a large number of different functions, and many different folds can be associated with the same function.
At the time of writing, the CATH database comprises around 1400 protein folds and approximately 3000 homologous domain superfamilies. These have been found to comprise at least 70% of protein domain sequences , which suggests that a relatively limited number of folds carry out the huge diversity of functions observed in protein function space [17-19]. For the majority of the domain superfamilies in CATH (>90%), domains have highly similar structures and functions. However, these conserved superfamilies tend to be small and highly specific to certain species or subkingdoms of life. Most of the remaining superfamilies can incorporate large amounts of structural and functional diversity and are highly populated, accounting for >50% of all known domains. Structural and functional diversity between domains in enzyme superfamilies can be attributed to the use of different sets of residues in their active site, the addition of secondary structure embellishments to the core domain structure, or domain recruitment .
A recent study on the diversity of functional sites in CATH superfamilies by  showed that, for most superfamilies, the spatial locations of functional sites are limited. By contrast, members of the most diverse superfamilies show a considerable amount of functional plasticity, as their relatives can exploit different sites for interacting with their protein partners or for binding small-ligands.
By subclassifying these diverse superfamilies into functional families , it is possible to group relatives sharing a common functional site and similar functional properties. Structural similarity with a protein in a CATH functional family can therefore be used to infer functional properties more accurately. Functional family classification also provides a means to understand the mechanisms of its functional divergence during evolution.