Efficient bit permutation using delta swaps

A common operation in cryptographic block cipher algorithms is to shuffle the input bits in some specified way. While such bit permutations are easy to do in hardware, they tend to be difficult to implement efficiently in software. The Initial Permutation (IP) in the DES block cipher is a good case in point. It is defined as the following rearrangement of the 64 input bits:

DES Initial Permutation

A naive implementation would loop through each of the 64 input bits one by one and place them at the correct position in the output. Needless to say, this would be rather slow. A pre-computed lookup table for a 64-bit permutation would be too huge to even consider, while multiple piece-wise lookup tables would most likely not offer any performance benefit over a naive loop. However, there is a clever solution using only 30 bitwise operations (15 XORs, 10 shifts, 5 ANDs). The key idea is to repeatedly swap non-overlapping groups of bits with an operation called a delta swap:

In a nutshell, the delta swap operation can simultaneously swap multiple bit pairs where the two bits in each pair are separated by the same distance from each other. Let’s examine how this works. (Notation: \oplus means XOR, \ll means bitwise shift left, \gg means bitwise shift right)

We start with the input block represented as a sequence of 64 bits:

a=a_{1}a_{2}...a_{64}

Suppose we want to swap the bit at position i with the bit at position j, where j > i. The distance between the two bits is

\delta=j-i

Now notice that if we construct a temporary value b=b_{1}b_{2}...b_{64} such that

b_{n} = \left\{ \begin{array}{lr}a_{i} \oplus a_{j} & : n \in \{i,j\} \\ 0 & : n \notin \{i,j\} \end{array} \right.

then a \oplus b swaps the bits a_{i} and a_{j} as required, because of the following properties of XOR:

a_{i} \oplus (a_{i} \oplus a_{j})=a_{j} \\ a_{j} \oplus (a_{i} \oplus a_{j})=a_{i}

a_{i} \oplus 0=a_{i}

bitbox1

This is exactly how a delta swap works. The temporary value b is constructed by shifting the input a to the right \delta bits, then XORing with the unshifted version of a. This places a_{i} \oplus a_{j} in b_{j}. The other bits of b are then zeroed out by ANDing with a mask that only has the bit at position j turned on. Finally, the bit swap in a is accomplished by XORing a with an unshifted b and then with b shifted left \delta bits. This is illustrated diagrammatically below.

bitbox12

In order to swap multiple bit pairs simultaneously, we need only turn on the appropriate bits in the AND mask. Specifically, for each 1 bit in the mask, the corresponding bit in the input will be swapped with the input bit that is \delta positions to its left.

Now that we’re armed with the delta swap technique, let’s return to the original problem of implementing the DES Initial Permutation efficiently. Notice that if we arrange the 64-bit input block in the DES algorithm as an 8 x 8 bit matrix, then the Initial Permutation can be visualized as a flip of the matrix about its diagonal (the one that extends from bit 57 to bit 08), followed by the shuffling of its rows. We can accomplish this with a sequence of five delta swaps. The first three flip the matrix about its diagonal, and the last two shuffle the rows.

DeltaSwap1

DeltaSwap2

DeltaSwap3

DeltaSwap4

DeltaSwap5

The DES Final Permutation (FP), being the inverse of the Initial Permutation, can be implemented by performing the same delta swaps in reverse order. The reason why the delta swap technique is so useful is that it works for any permutation of bits. In fact, it can be shown that for 64-bit permutations the required number of delta swaps will never exceed 11.

Advertisements

2 Responses to Efficient bit permutation using delta swaps

  1. bouanto says:

    Hello.
    Thanks for this great article.
    I wonder if you know an algorithm to find one of the most efficient sequences of delta swaps giving a conversion table.
    I need this because I am currently optimizing my des implementation.
    I updated my implementation to use delta swaps for IP and FP, thanks to your article.
    I found a way to improve the E permutations but I am at a loss to find the most efficient delta swaps for the P permutations.
    Could you please tell me how you found these efficient delta swaps?
    Thanks for your help.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: