# Advanced attacks on hash functions

A deeper understanding of hash function security can be obtained through consideration of various general attack strategies. The resistance of a particular hash function to known general attacks provides a (partial) measure of security. A selection of prominent attack strategies is presented in this section, with the intention of providing an introduction sufficient to establish that designing (good) cryptographic hash functions is not an easily mastered art. Many other attack methods and variations exist; some are general methods, while others rely on peculiar properties of the internal workings of specific hash functions.

## Birthday attacks

*Algorithm-independent attacks* are those which can be applied to any hash function, treating it as a black-box whose only significant characteristics are the output bitlength *n* (and MAC key bitlength for MACs), and the running time for one hash operation. It is typically assumed the hash output approximates a uniform random variable. Attacks falling under this categoiy include those based on hash-result bitsize (page 336); exhaustive MAC key search (page 336); and birthday attacks on hash functions (including memoryless variations) as discussed below.

(i) Yuval’s birthday attack on hash functions

Yuval’s birthday attack was one of the first (and perhaps the most well-known) of many cryptographic applications of the birthday paradox arising from the classical occupancy distribution (§2.1.5): when drawing elements randomly, with replacement, from a set of *N* elements, with high probability a repeated element will be encountered after О(/]V) selections. Such attacks are among those called *square-root attacks.*

The relevance to hash functions is that it is easier to find collisions for a one-way hash function than to find pre-images or second preimages of specific hash-values. As a result, signature schemes which employ one-way hash functions may be vulnerable to Yuval’s attack outlined below. The attack is applicable to all unkeyed hash functions (cf. Fact 9.33), with running time *0(2*^{m}/^{2}) varying with the bitlength m of the hash-value.

9.92 Algorithm Yuval’s birthday attack

INPUT: legitimate message *x*; fraudulent message a,-_{2}; г/г-bit one-way hash function *h. *OUTPUT: *x, x _{2}* resulting from minor modifications of

*xi, X*

*2*with

*h{x)*=

*h{x'*

_{2})- (thus a signature on
*x*serves as a valid signature on*x*_{2}). - 1. Generate
*t =*2^{m}/^{2}minor modifications*x*of*x.* - 2. Hash each such modified message, and store the hash-values (grouped with corresponding message) such that they can be subsequently searched on hash-value. (This can done in
*Oft)*total time using conventional hashing.) - 3. Generate minor modifications
*x*of_{2}*x*, computing_{2}*h(x*for each and checking for matches with any_{2})*x*above; continue until a match is found. (Each table lookup will require constant time; a match can be expected after about / candidates*x'*_{2}.) - 9.93 Remark
*(application of birthday attack*) The idea of this attack can be used by a dishonest signer who provides to an unsuspecting party his signature on*x*and later repudiates signing that message, claiming instead that the message signed was*x*or by a dishonest verifier, who is able to convince an unsuspecting party to sign a prepared message_{2},*x*, and later claim that party’s signature on*x*_{2}. This remark generalizes to other schemes in which the hash of a message is taken to represent the message itself.

Regarding practicality, the collisions produced by the birthday attack are “real” (vs. pseudo-collisions or compression function collisions), and moreover of direct practical consequence when messages are constructed to be meaningful. The latter may often be done as follows: alter inputs via individual minor modifications which create semantically equivalent messages (e.g., substituting tab characters in text files for spaces, unprintable characters for each other, etc.). For 128-bit hash functions, 64 such potential modification points are required to allow 2^{04} variations. The attack then requires O(2^{04}) tune (feasible with extreme parallelization); and while it requires space for 0(2^{64}) messages (which is impractical), the memory requirement can be addressed as discussed below.

(ii) Memoryiess variation of birthday attack

To remove the memory requirement of Algorithm 9.92, a deterministic mapping may be used which approximates a random walk through the hash-value space. By the birthday paradox, in a random walk through a space of 2^{m} points, one expects to encounter some point a second time (i.e., obtain a collision) after *0(*2^{m}/^{2}) steps, after which the walk will repeat its previous path (and begin to cycle). General memoryless cycle-finding techniques may then be used to find this collision. (Here *memoryless* means requiring negligible memory, rather than in the stochastic sense.) These include Floyd’s cycle-finding algorithm (§3.2.2) and improvements to it.

Following Algorithm 9.92, let *g* be a function such that *g(x*i, *H) = x[* is a minor modification, determined by the hash-value Я, of message *x* (each bit of Я might define whether or not to modify *x* at a pre-determined modification point). If *xi* is fixed, then *g* essentially maps a hash-result to a message and it is convenient to write *g _{Xl}* (Я) =

*x.*Moreover, let

*g*be injective so that distinct hashes Я result in distinct

*x.*Then, with fixed messages

*xi,x*and using some easily distinguishable property (e.g., parity) which splits the space of hash-values into two roughly equal-sized subsets, define a function

_{2},*r*mapping hash-results to hash-results by:

The memoryless collision search technique (see above) is then used to find two inputs to *r *which map to the same output (i.e., collide). If *h* behaves statistically as a random mapping then, with probability 0.5, the parity will differ in Я and Я' for the colliding inputs, in which case without loss of generality *h(g _{Xl}* (Я)) =

*li(g*This yields a colliding pair of variations

_{x}._{2}(H')).*x = g*(Я),

_{Xl}*x'*of distinct messages

_{2}= g_{x}._{2}(H')*x, x*respectively, such that

_{2},*h(x[) = h(x*as per the output of Algorithm 9.92.

_{2}),(iii) Illustrative application to MD5

Actual application of the above generic attack to a specific hash function raises additional technicalities. To illustrate how these may be addressed, such application is now examined, with assumptions and choices made for exposition only. Let *h* be an iterated hash function processing messages in 512-bit blocks and producing 128-bit hashes (e.g., MD5, RIPEMD- 128). To minimize computational expense, restrict *r* (effectively *g* and *h*) in equation (9.4) to single 512-bit blocks of *x**, such that each iteration of *r* involves only the compression function *f* on inputs one message block and the current chaining variable.

Let the legitimate message input *xi* consist of *s* 512-bit blocks (,s > 1, prior to MD- strengthening). Create a fraudulent message *x _{2}* of equal bitlength. Allow

*x*to differ from a;: up to and including the

_{2}*j*block, for any fixed

^{th}*j < s*— 1. Use the

*(j +*l)

^{st}block of

*x*denoted

_{t},*В,*(г = 1,2), as a matching/replacement block, to be replaced by the 512-bit blocks resulting from the collision search. Set all blocks in

*x*subsequent to

_{2}*B,*identically equal to those in

*x x*will then differ from

*x*only in the single block (j + 1). For maximum freedom in the construction of

_{t}*x*choose

_{2},*j = s*- 1. Let ci, c

_{2}be the respective 128-bit intermediate results (chaining variables) after the iterated hash operates on the first

*j*blocks of

*xi, x*Compression function

_{2}.*f*maps (128 + 512 =) 640-bit inputs to 128-bit outputs. Since the chaining variables depend on aq,

*g*may be defined independent of

_{x},{= g)*x*here (cf. equation (9.4)); assume both entire blocks

_{t }*В,*may be replaced without practical implication. Let

*g(H) = В*denote an injective mapping from the space of 128-bit hash- values to the space of 512-bit potential replacement blocks, defined as follows: map each two-bit segment of

*H*to one of four 8-bit values in the replacement block

*B.*(A practical motivation for this is that if .r, is an ASCII message to be printed, and the four 8-bit values are selected to represent non-printable characters, then upon printing, the resulting blocks

*В*are all indistinguishable, leaving no evidence of adversarial manipulation.)

The collision-finding function *r* for this specific example (corresponding to the generic equation (9.4)) is then:

Collisions for MD5 (and similar hash functions) can thus be found in *0(*2^{04}) operations and without significant storage requirements.