# Substitution ciphers (background)

This section considers the following types of classical ciphers: simple (or mono-alphabetic) substitution, polygram substitution, and homophonic substitution. The difference between codes and ciphers is also noted. Polyalphabetic substitution ciphers are considered in §7.3.3.

(i) Mono-alphabetic substitution

Suppose the ciphertext and plaintext character sets are the same. Let *m =* т^тз ... be a plaintext message consisting of juxtaposed characters *тщ **e **A,* where *A* is some fixed character alphabet such as *A = ***{А, ***B,... , Z}.* A *simple substitution cipher* or *mono- alpliabetic substitution cipher* employs a permutation e over *A,* with encryption mapping

*E _{e}(m) =* e(mi)e(m2)e(m3)____Here juxtaposition indicates concatenation (rather than

multiplication), and e(m,j is the character to which m, is mapped by e. This corresponds to Definition 1.27.

7.48 Example (*trivial shift cipher/Caesar cipher*) A *shift cipher* is a simple substitution cipher

with the permutation e constrained to an alphabetic shift through A characters for some fixed *k.* More precisely, if |A| = *s,* and *пгц* is associated with the integer value *i,* 0 < *i < s* — 1, then Cj = e(m,) = m, + A mod *s.* The decryption mapping is defined by *d(c,) = c, - *A mod *s.* For English text, *s =* 26, and characters A through Z are associated with integers 0 through 25. For A = 1, the message *m =* HAL is encrypted to c = IBM. According to folklore, Julius Caesar used the key A = 3. □

The shift cipher can be trivially broken because there are only s = A keys (e.g., s = 26) to exhaustively search. A similar comment holds for affine ciphers (Example 7.49). More generally, see Fact 7.68.

7.49 Example (*affine cipher-historical)* The affine cipher on a 26-letter alphabet is defined by

*ек{х) = ax+ b* mod 26, where 0 < *a, b <* 25. The key is (a, *b).* Ciphertext с = *ек{х)* is decrypted using d/<(c) = (c — *b)a*~^{1} mod 26, with the necessary and sufficient condition for invertibility that gcd(a, 26) = 1. Shift ciphers are a subclass defined by *a* = 1. □

- 7.50 Note (
*recognizing simple substitution*) Mono-alphabetic substitution alters the frequency of individual plaintext characters, but does not alter the frequency distribution of the overall character set. Thus, comparing ciphertext character frequencies to a table of expected letter frequencies (unigram statistics) in the plaintext language allows associations between ciphertext and plaintext characters. (E.g., if the most frequent plaintext character X occurred twelve tunes, then the ciphertext character that X maps to will occur twelve times). - (ii) Polygram substitution

A simple substitution cipher substitutes for single plaintext letters. In contrast, *polygram substitution ciphers* involve groups of characters being substituted by other groups of characters. For example, sequences of two plaintext characters (*digrams*) may be replaced by other digrams. The same may be done with sequences of three plaintext characters (*trigram*0, or more generally using n-grams.

In full digram substitution over an alphabet of 26 characters, the key may be any of the 26^{2} digrams, arranged in a table with row and column indices corresponding to the first and second characters in the digram, and the table entries being the ciphertext digrams substituted for the plaintext pans. There are then (26^{2})! keys.

7.51 Example *(Playfair cipher* - *historical)* A digram substitution may be defined by arranging the characters of a 25-letter alphabet *(I* and *J* are equated) in a 5 x 5 matrix *M.* Adjacent plaintext characters are paired. The pair (рьрг) is replaced by the digram (сз, c_{4}) as follows. If pi and *p**2* are in distinct rows and columns, they define the comers of a submatrix (possibly *M* itself), with the remaining comers сз and c_{4}; c_{3} is defined as the character in the same column as pi. If *pi* and *p**2* are in a common row, сз is defined as the character immediately to the right of *pi* and c_{4} that immediately right of *p**2* (the first column is viewed as being to the right of the last). If *p* and *p**2* are in the same column, the characters immediately (circularly) below them are сз and c_{4}. If *pi = p**2**,* an infrequent plaintext character (e.g., *X*) is inserted between them and the plaintext is re-grouped. While cryptanalysis based on single character frequencies fails for the Playfair cipher (each letter may be replaced by any other), cryptanalysis employing digram frequencies succeeds. □

The key for a Playfair cipher is the 5x5 square. A mnemonic aid may be used to more easily remember the square. An example is the use of a meaningful keyphrase, with repeated letters deleted and the remaining alphabet characters included alphabetically at the end. The keyphrase “PLAYFAIR IS A DIGRAM CIPHER’’ would define a square with rows PLAYF, IRSDG, MCHEB, KNOQT, UVWXZ. To avoid the trailing characters always being from the end of the alphabet, a further shift cipher (Example 7.48) could be applied to the resulting 25-character string.

Use of keyphrases may seriously reduce the key space entropy. This effect is reduced if the keyphrase is not directly written into the square. For example, the non-repeated keyphrase characters might be written into an 8-column rectangle (followed by the remaining alphabet letters), the trailing columns being incomplete. The 25-character string obtained by reading the columns vertically is then used to fill the 5x5 square row by row.

- 7.52 Example
*(Hill cipher*-*historical*) An n-gram substitution may be defined using an invertible*n*x*n*matrix*A =*as the key to map an n-character plaintext*m... m*to a ciphertext n-gram c, =_{n}*J2*j=i*Decryption involves using A^{a}ij^{m}j> г = 1,... ,n.^{-1}. Here characters A-Z, for example, are associated with integers 0-25. This polygram substitution cipher is a linear transformation, and falls under known-plaintext attack. □ - (iii) Homophonic substitution

The idea of homophonic substitution, introduced in §1.5, is for each fixed key *к* to associate with each plaintext unit (e.g., character) *m* a set *S(k, m)* of potential corresponding ciphertext units (generally all of common size). To encrypt *m* under *k,* randomly choose one element from this set as the ciphertext. To allow decryption, for each fixed key this one-to-many encryption function must be injective on ciphertext space. Homophonic substitution results in ciphertext data expansion.

In homophonic substitution, | *S(k, m)* | should be proportional to the frequency of *m* in the message space. The motivation is to smooth out obvious irregularities in the frequency distribution of ciphertext characters, which result from irregularities in the plaintext frequency distribution when simple substitution is used.

While homophonic substitution complicates cryptanalysis based on simple frequency distribution statistics, sufficient ciphertext may nonetheless allow frequency analysis, in conjunction with additional statistical properties of plaintext manifested in the ciphertext. For example, in long ciphertexts each element of *S(k,m)* will occur roughly the same number of times. Digram distributions may also provide information.

(iv) Codes vs. ciphers

A technical distinction is made between *ciphers* and *codes.* Ciphers are encryption techniques which are applied to plaintext units (bits, characters, or blocks) independent of their semantic or linguistic meaning; the result is called ciphertext. In contrast, cryptographic codes operate on linguistic units such as words, groups of words, or phrases, and substitute (replace) these by designated words, letter groups, or number groups called *codegroups. *The key is a dictionary-like *codebook* listing plaintext units and their corresponding codegroups, indexed by the former; a corresponding codebook for decoding is reverse-indexed.

When there is potential ambiguity, codes in this context (vs. ciphers) may be qualified as *cryptographic codebooks,* to avoid confusion with error-correcting codes (EC-codes) used to detect and/or correct non-malicious errors and authentication codes (А-codes, or MACs as per Definition 9.7) which provide data origin authentication.

Several factors suggest that codes may be more difficult to break than ciphers: the key (codebook) is vastly larger than typical cipher keys; codes may result in data compression (cf. Fact 7.71); and statistical analysis is complicated by the large plaintext unit block size (cf. Note 7.74). Opposing this are several major disadvantages: the coding operation not being easily automated (relative to an algorithmic mapping); and identical encryption of repeated occurrences of plaintext units implies susceptibility to known-plaintext attacks, and allows frequency analysis based on observed traffic. This implies a need for frequent rekeying (changing the codebook), which is both more costly and inconvenient. Consequently, codes are not commonly used to secirre modem telecommunications.