Intra Versus Inter-Pipeline
The clock frequency of a circuit is inversely proportional to the longest propagation delay between two registers of that same circuit. Consequently, the more complex the logic between each pipeline stage, the longer the propagation delay and the lower the clock frequency of the system. As such, the more pipelined the design is, the faster the hardware structure will operate, but more clock cycles will be required to finish a given computation.
In round-based algorithms, such as AES, inter-pipeline refers to the registers that, every clock cycle, store the processed value of one round, and then feed that data to the next round. In rolled round architectures, only one pipeline register is placed between each round logic. The location of these registers can be at the end of the round logic or in between it, such as on the BRAMs computing the TBoxes.
Intra-pipeline refers to the implementation of additional registers between the AES round operations, in order to reduce the critical path and increase clock frequency. Intra-pipeline can exist in either unrolled and rolled round structures.
Strongly unrolled round architectures should always aim to have as much pipeline registers as possible to achieve the highest clock frequency, as their throughput performances are not affected while streaming independent data blocks [14, 17]. For rolled architectures, however, a trade-off between number of cycles and their latency needs to be considered when planning an AES pipelined structure.
In rolled structures, several implementations with 1,4, or 8 cycles per round have been presented [4, 5, 9, 20, 23], with lower to higher clock frequencies, respectively.