# Hash Functions and Data Integrity

Contents in Brief

- 9.1
**Introduction.............................321** - 9.2
**Classification and framework....................322** - 9.3
**Basic constructions and general results...............332** - 9.4
**Unkeyed hash functions (MDCs)..................338** - 9.5
**Keyed hash functions (MACs)...................352** - 9.6
**Data integrity and message authentication.............359** - 9.7
**Advanced attacks on hash functions................368** - 9.8
**Notes and further references....................376**

## Introduction

*Cry ptographic hash functions* play a fundamental role in modem cryptography. While related to conventional hash functions commonly used in non-cryptographic computer applications - in both cases, larger domains are mapped to smaller ranges - they differ in several important aspects. Our focus is restricted to cryptographic hash functions (hereafter, simply hash functions), and in particular to their use for data integrity and message authentication.

Hash functions take a message as input and produce an output referred to as a *hash- code, hash-result, hash-value,* or simply *hash.* More precisely, a hash function *h* maps bit- strings of arbitrary finite length to strings of fixed length, say ?г bits. For a domain *D* and range *R* with *h : D—rR* and *I)* > *R.,* the function is many-to-one, implying that the existence of *collisions* (pairs of inputs with identical output) is unavoidable. Indeed, restricting *h* to a domain of /-bit inputs (< > *n),* if *h* were “random” in the sense that all outputs were essentially equiprobable, then about *2 ^{I}~”* inputs would map to each output, and two randomly chosen inputs would yield the same output with probability

*2~*(independent of

^{n}*t).*The basic idea of cryptographic hash functions is that a hash-value serves as a compact representative image (sometimes called an

*imprint, digital fingerprint,*or

*message digest*) of an input string, and can be used as if it were uniquely identifiable with that string.

Hash functions are used for data integrity in conjunction with digital signature schemes, where for several reasons a message is typically hashed first, and then the hash-value, as a representative of the message, is signed in place of the original message (see Chapter 11). A distinct class of hash functions, called message authentication codes (MACs), allows message authentication by symmetric techniques. MAC algorithms may be viewed as hash functions which take two functionally distinct inputs, a message and a secret key, and produce a fixed-size (say n-bit) output, with the design intent that it be infeasible in practice to produce the same output without knowledge of the key. MACs can be used to provide data integrity and symmetric data origin authentication, as well as identification in symmetric-key schemes (see Chapter 10).

A typical usage of (unkeyed) hash functions for data integrity is as follows. The hash- value corresponding to a particular message *x* is computed at time 7. The integrity of this hash-value (but not the message itself) is protected in some manner. At a subsequent time *T-**2**,* the following test is carried out to determine whether the message has been altered, i.e., whether a message *x'* is the same as the original message. The hash-value of *x'* is computed and compared to the protected hash-value; if they are equal, one accepts that the inputs are also equal, and thus that the message has not been altered. The problem of preserving the integrity of a potentially large message is thus reduced to that of a small fixed-size hash- value. Since the existence of collisions is guaranteed in many-to-one mappings, the unique association between inputs and hash-values can, at best, be in the computational sense. A hash-value should be uniquely identifiable with a single input *in practice,* and collisions should be *computationally* difficult to find (essentially never occurring in practice).

**Chapter outline**

The remainder of this chapter is organized as follows. §9.2 provides a framework including standard definitions, a discussion of the desirable properties of hash functions and MACs, and consideration of one-way functions. §9.3 presents a general model for iterated hash functions, some general construction techniques, and a discussion of security objectives and basic attacks (i.e., strategies an adversary may pursue to defeat the objectives of a hash function). §9.4 considers hash functions based on block ciphers, and a family of functions based on the MD4 algorithm. §9.5 considers MACs, including those based on block ciphers and customized MACs. §9.6 examines various methods of using hash functions to provide data integrity. §9.7 presents advanced attack methods. §9.8 provides chapter notes with references.