Fastest Luhn algorithm checksum on this street

cross-posted from: https://lemmyrs.org/post/87159

One of the digits of your credit card number is not like the rest of them: it’s purpose is to check for value correctness, this way any website or form knows what if checksum matches - the value was typed in correctly. Or you are lucky with a typo because it’s not a very good checksum - it was designed to be easy to calculate on a mechanical device and been sticking around mostly for historical reasons.

To check if a number is valid according to Luhn checksum you would go from right to left, double every second digit, add all the digits together, if result ends with 0 - checksum matches, if it ends with some other value - it does not.

For example let’s check the number 1594: write down the number as a bunch of digits

1 5 9 4

double every second digit from the right

2 5 18 4

add all the digits

2 + 5 + 1 + 8 + 4 = 20

ends with 0, so checksum is valid

Three key optimizations help to calculate it fast:

You can split longer sums into short ones
You can skip second splitting into digits by doing multiplication as usual and adding number of digits above 5 to the total
SWAR can to perform a bunch of operations on individual digits at once

To illustrate first optimization let’s calculate 1 5 9 4 sum in two parts

1 5 => 2 5 => 2 + 5 = 7
9 4 => 18 4 => 1 + 8 + 4 = 13
7 + 13 = 20 - checksum is valid

to illustrate the second one

// digits themselves
1 5 9 4 => 2 5 18 4 => 2 + 5 + 18 + 4 = 29
// correction
0 0 1 0 => 0 + 0 + 1 + 0 = 1
total: 29 + 1 = 30

Result is off by 10, but for checksum validity purposes it’s good enough.

Last trick comes from doing SWAR and skipping extracting digits in the first place.

User input comes as an ASCII string “1594”, which we split into chunks of size 8 to fit into 64 bit register and pad with “0” from the right:

"1594" => "00001594"
// subtract  0x3030303030303030 ("00000000")
// to get decimal values
"00001594" - "00000000" => [0, 0, 0, 0, 1, 5, 9, 4]

At this point it is easy to check if string consists of only decimal digits just by adding 0x4646464646464646 and looking for overflows and underflows in both results - can be done for both at once

Multiplying every other digit by 2 and adding them together is done with a regular multiplication by 0x0201020102010201, and math will do the rest:

    A   B
  * 1   2
----------
   2A  2B
 A  B

Top byte of the lower half of the result will contain 2A + B for 2 digit case or the full result for all 8 digits in the actual code. Because all the digits are below 10 - there’s no overflow from earlier digits.

Whole code looks like this:

fn fold10_swar(mask1: u64, mask2: u64, raw: &[u8]) -> Option<u64> {
    let mut sum = 0;

    for c in raw.rchunks(8) {
        let mut buf = [b'0'; 8];
        copy_from_small_slice(&mut buf, c);

        let mut v = u64::from_le_bytes(buf);
        // try to overflow value up
        let a = v.wrapping_add(0x4646464646464646);
        // and down
        v = v.wrapping_sub(0x3030303030303030);
        // if either direction overflows - there are non 0..9 digits so
        // checksum can't possibly be valid, otherwise all the values are digits
        if (a | v) & 0x8080808080808080 == 0 {
            // Calculate number of digits above 5 located at positions that would double
            // Doubling them would result to values above 9 which will necessitate subtracting
            // 9. But in mod 10 arithmetic -9 and +1 is the same so simply adding
            // count of such digits is enough
            sum += u64::from((mask2.wrapping_sub(v) & 0x8080808080808080).count_ones());
            sum += v.wrapping_mul(mask1) >> 56;
        } else {
            return None;
        }
    }
    Some(sum)
}

And good news - luhn3 crate can check the correctness or calculate a digit to use as a checksum for you and it works somewhat fast - on a 5 year old processor when compiled with target-cpu=native it can verify a 16 digit credit card checksum in about 3 nanoseconds - in 12 or so CPU cycles, benchmarks included.