compsci: Duff's device for loop unrolling

Loop unrolling is a performance optimisation technique that aims to reduce comparisons between executions of a loop body.

Duff's device is a beautiful/surprising technique for implementing that in C, relying on the ablity to intertwine case statements inside a do while loop:

send(to, from, count)
register short *to, *from;
register count;
{
    register n = (count + 7) / 8;
    switch (count % 8) {
    case 0: do { *to = *from++;
    case 7:      *to = *from++;
    case 6:      *to = *from++;
    case 5:      *to = *from++;
    case 4:      *to = *from++;
    case 3:      *to = *from++;
    case 2:      *to = *from++;
    case 1:      *to = *from++;
            } while (--n > 0);
    }
}

The idea is to handle both an initial <N cases for the loop unrolling with the same structure that handles the per N unrolled loop iterations.

The partially separated first case and unrolled loop version makes it clearer how this magic is working:

send(to, from, count)
register short *to, *from;
register count;
{
    register n = (count + 7) / 8;
    switch (count % 8) {
        case 0: *to = *from++;
        case 7: *to = *from++;
        case 6: *to = *from++;
        case 5: *to = *from++;
        case 4: *to = *from++;
        case 3: *to = *from++;
        case 2: *to = *from++;
        case 1: *to = *from++;
    }
    while (--n > 0) {
        *to = *from++;
        *to = *from++;
        *to = *from++;
        *to = *from++;
        *to = *from++;
        *to = *from++;
        *to = *from++;
        *to = *from++;
    }
}
Published on: 31 Jul 2023