A.5 The Genetic Code: Mapping RNA to Amino Acids

But before we can discuss translation, we need to understand what is being translated into what. Translation requires a code. Like Morse Code or any other coding system, the genetic code enables the mapping of sequences in one alphabet to sequences in a second alphabet or, in our case, from one set of monomers (nucleotide bases) to a second set of monomers (amino acids). In Morse Code, for example, “dot-dash” maps to the letter A. In the ASCII computer code, the binary sequence “100 0001” maps to the upper-case letter A,4

The genetic code is this kind of map, and it maps from the nucleotide sequences of mRNA to the polypeptide sequences of amino acids that constitute proteins. Figuring out the mapping from one to the other, deciphering the genetic code, was a major achievement of molecular biology during the 1950s and 1960s.

Twenty different amino acids are found in the protein alphabet, but only four nucleotide bases in the mRNA transcript. Obviously, this is a mismatch; there are too few nucleotide letters to map uniquely a 20-letter amino acid alphabet. If you tried it, you would have 16 orphan amino acids. Even if you map from pairs of nucleotides (AA, AT, AC, AG, etc.), there are still not enough, just 16 ways to combine the four letters two at a time. That would leave you with four orphan amino acids.

Building a coding map sufficient to account for all 20 amino acids requires a minimum of three letters (AAA, AAT, AAC, etc.), and this is how the genetic code does it, using triplets of nucleotide bases. The nucleotide triplets are called codons. But now there is a huge surplus, way too many triplets for a one-to-one

A.4 This is the canonical representation of the genetic code, showing the 64 codons and the amino acids they code for

FIGURE A.4 This is the canonical representation of the genetic code, showing the 64 codons and the amino acids they code for.

Source: National Human Genome Research Institute (www.genome.gov/genetics-glossary/Genetic- Code) match. There are 64 ways to combine the four letters three at a time, far more unique combinations than are needed. As a result, there is redundancy in the code; most amino acids are forced to correspond to more than one codon. For example, six different triplets can represent the amino acid Arginine and another six can represent Leucine. These triplet sequences are synonymous; they all mean the same thing.

If you study the table closely you will see that there are three special codons, labeled stop codons. These are boundary markers for the end of the gene. Much like punctuation in human texts, they serve as signals to the molecules involved in transcription and translation.