Efficient bit permutation using delta swaps

A common operation in cryptographic block cipher algorithms is to shuffle the input bits in some specified way. While such bit permutations are easy to do in hardware, they tend to be difficult to implement efficiently in software. The Initial Permutation (IP) in the DES block cipher is a good case in point. It is defined as the following rearrangement of the 64 input bits:

DES Initial Permutation

A naive implementation would loop through each of the 64 input bits one by one and place them at the correct position in the output. Needless to say, this would be rather slow. A pre-computed lookup table for a 64-bit permutation would be too huge to even consider, while multiple piece-wise lookup tables would most likely not offer any performance benefit over a naive loop. However, there is a clever solution using only 30 bitwise operations (15 XORs, 10 shifts, 5 ANDs). The key idea is to repeatedly swap non-overlapping groups of bits with an operation called a delta swap:


#define DELTA_SWAP(a, b, delta, mask) \
b = (a ^ (a >> delta)) & mask; \
a = a ^ b ^ (b << delta);

view raw

deltaswap.c

hosted with ❤ by GitHub

In a nutshell, the delta swap operation can simultaneously swap multiple bit pairs where the two bits in each pair are separated by the same distance from each other. Let’s examine how this works. (Notation: \oplus means XOR, \ll means bitwise shift left, \gg means bitwise shift right)

We start with the input block represented as a sequence of 64 bits:

a=a_{1}a_{2}...a_{64}

Suppose we want to swap the bit at position i with the bit at position j, where j > i. The distance between the two bits is

\delta=j-i

Now notice that if we construct a temporary value b=b_{1}b_{2}...b_{64} such that

b_{n} = \left\{ \begin{array}{lr}a_{i} \oplus a_{j} & : n \in \{i,j\} \\ 0 & : n \notin \{i,j\} \end{array} \right.

then a \oplus b swaps the bits a_{i} and a_{j} as required, because of the following properties of XOR:

a_{i} \oplus (a_{i} \oplus a_{j})=a_{j} \\ a_{j} \oplus (a_{i} \oplus a_{j})=a_{i}

a_{i} \oplus 0=a_{i}

bitbox1

This is exactly how a delta swap works. The temporary value b is constructed by shifting the input a to the right \delta bits, then XORing with the unshifted version of a. This places a_{i} \oplus a_{j} in b_{j}. The other bits of b are then zeroed out by ANDing with a mask that only has the bit at position j turned on. Finally, the bit swap in a is accomplished by XORing a with an unshifted b and then with b shifted left \delta bits. This is illustrated diagrammatically below.

bitbox12

In order to swap multiple bit pairs simultaneously, we need only turn on the appropriate bits in the AND mask. Specifically, for each 1 bit in the mask, the corresponding bit in the input will be swapped with the input bit that is \delta positions to its left.

Now that we’re armed with the delta swap technique, let’s return to the original problem of implementing the DES Initial Permutation efficiently. Notice that if we arrange the 64-bit input block in the DES algorithm as an 8 x 8 bit matrix, then the Initial Permutation can be visualized as a flip of the matrix about its diagonal (the one that extends from bit 57 to bit 08), followed by the shuffling of its rows. We can accomplish this with a sequence of five delta swaps. The first three flip the matrix about its diagonal, and the last two shuffle the rows.

DeltaSwap1

DeltaSwap2

DeltaSwap3

DeltaSwap4

DeltaSwap5

The DES Final Permutation (FP), being the inverse of the Initial Permutation, can be implemented by performing the same delta swaps in reverse order. The reason why the delta swap technique is so useful is that it works for any permutation of bits. In fact, it can be shown that for 64-bit permutations the required number of delta swaps will never exceed 11.