An intermediate result (much stronger mixer).
A frustrating search.
When working on finding good mixers, I got really frustrated by the extremely slow process of testing a new set of constants or other tweaks to functions that were "kind of good".My criterion for "success" was that the output from a simple counter, starting at 0, passed to the mixing function, and then taking the output from the mixing function and feeding it to PractRand with -tf 2 should not generate any failures up to $2^{45}$ bytes.
Eventually, I came to realize that what I really wanted was not just a function that would satisfy the "generate PractRand-passing output when fed a unitary counter" but rather "a decent approximation of a random permutation". Such a permutation would give a higgledy-piggledy sequence from a very regular one, regardless of what the regularity looks like (just kidding; the regularities of interest are affine transformations, of course there are many other sequences one could think of as "regular"). Previosuly, I had been testing with feeding gray-code sequences to the mixing function, hoping that the unit Hamming distance between adjacent inputs would reveal flaws more quickly. For the functions I tested, I couldn't find that it made any difference w.r.t. how quickly testing failed.
As I was thinking about more sequences that would be "very well ordered" I came up with a very simple testing procedure that seems to work well: A function that pass the subtests described below is also very likely to work well on any kind of (affine) counter update.
The fact that many of the subtest input sequences intersect does not affect the tests since they are run independently.
Test procedure
Instead of testing the full output, which is very slow, test many short output sequences in the hopes that anomalies in the mixer will be revealed more quickly. There's quite some conjecturing here: the (lack of) failures on many short tests hopefully correlate with overall quality. If the mixer is good, this is also a very lengthy process (the way I have been doing it, 128 TB of data will be tested if there are no failures) but running many small tests (1 TB a piece) can be done in parallell on a machine with many cores but without a using a lot of RAM.
Naming
To avoid confusion, let's call this test procedure "RR-b-x-TF2-0.94". "RR" to indicate "rotate and bit reversal", "b" for the word size in input/output bits, x for the tlmax-exponent, TF2 to indicate that PractRand was run with "-tf 2" and "0.94" for the version of PractRand I am currently using. The same procedure can be used for any permutation of size $2^{8b}$. Pseudo-code for RR-b-x-TF2-0.94:
# The tests I have run have been with b = 64 and x = tlmax = 40,
# i.e. 1 TB per subtest.
# rfun is either the identity function or a b-bit bit-order reversal function.
for rfun in {identity, bitreversal}
# Rotate counter 0 to (b - 1) bits.
for r in {0, ..., b - 1}
# No failure seen for this rfun & rotation so far.
failure = 0
# 2^x bytes, 2^x / b * 8 b-bit words.
for ctr in {0, ..., (2^tlmax / b * 8) - 1}
# Feed PractRand, break on failure.
# RORb is a b-bit right rotate function.
if feedPractRandFails(RORb(rfun(ctr), r))
# Note the failure level.
failure = ceil(log_2(ctr))
break for
end if
end for
emitTestScore(rfun, r, failure)
end for
end for
# i.e. 1 TB per subtest.
# rfun is either the identity function or a b-bit bit-order reversal function.
for rfun in {identity, bitreversal}
# Rotate counter 0 to (b - 1) bits.
for r in {0, ..., b - 1}
# No failure seen for this rfun & rotation so far.
failure = 0
# 2^x bytes, 2^x / b * 8 b-bit words.
for ctr in {0, ..., (2^tlmax / b * 8) - 1}
# Feed PractRand, break on failure.
# RORb is a b-bit right rotate function.
if feedPractRandFails(RORb(rfun(ctr), r))
# Note the failure level.
failure = ceil(log_2(ctr))
break for
end if
end for
emitTestScore(rfun, r, failure)
end for
end for
Test results
MurmurHash3 finalizer
RR-64-40-TF2-0.94 fails the MurmurHash3 finalizer quickly. This is to be expected considering that it is known to fail quickly when fed a unitary counter.Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 17 | 18 | 18 | 18 | 17 | 16 | 16 | 16 | 16 | 15 | 15 | 17 | 15 | 14 | 15 | 15 |
16 | 14 | 14 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 15 | 16 | 16 | 16 | 16 | 15 | 15 |
32 | 16 | 16 | 17 | 17 | 16 | 16 | 14 | 14 | 16 | 15 | 14 | 15 | 15 | 15 | 15 | 15 |
48 | 15 | 14 | 14 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 15 | 16 | 17 | 17 | 17 | 17 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 15 | 17 | 18 | 18 | 18 | 17 | 17 | 16 | 16 | 14 | 14 | 14 | 14 | 14 | 14 | 17 |
16 | 17 | 17 | 16 | 17 | 17 | 17 | 17 | 18 | 19 | 17 | 18 | 18 | 17 | 16 | 14 | 14 |
32 | 15 | 16 | 17 | 18 | 18 | 17 | 17 | 17 | 17 | 16 | 14 | 14 | 14 | 14 | 15 | 15 |
48 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 18 | 18 | 19 | 19 | 18 | 18 | 18 | 17 | 16 |
MurmurHash3, Variant 13
RR-64-40-TF2-0.94 fails Variant 13 quickly too. This is also to be expected considering that it is known to fail almost as fast as the MurmurHash3 finalizer.Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 19 | 17 | 18 | 18 | 18 | 17 | 18 | 18 | 18 | 18 | 18 | 19 | 19 | 16 | 17 | 17 |
16 | 17 | 17 | 16 | 17 | 16 | 16 | 17 | 17 | 16 | 17 | 16 | 17 | 17 | 17 | 18 | 18 |
32 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 20 | 20 | 19 | 20 | 19 | 19 |
48 | 17 | 17 | 16 | 17 | 17 | 17 | 16 | 17 | 16 | 17 | 17 | 17 | 18 | 18 | 19 | 19 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 16 | 17 | 18 | 18 | 18 | 19 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 20 | 20 | 19 |
16 | 20 | 19 | 21 | 18 | 19 | 21 | 19 | 22 | 20 | 21 | 22 | 21 | 20 | 22 | 22 | 19 |
32 | 18 | 18 | 18 | 19 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 20 | 20 | 20 | 20 | 19 |
48 | 21 | 18 | 18 | 21 | 19 | 22 | 20 | 21 | 19 | 18 | 18 | 18 | 18 | 18 | 18 | 18 |
Ettinger's mixer
Tommy Ettinger came up with a quite different construction, compared to MurmurHash3 (and variants) and my own rrmxmx (in C):
uint64_t ettingerMixer(uint64_t v) {
uint64_t z = (v ^ 0xDB4F0B9175AE2165UL) * 0x4823A80B2006E21BUL;
z ^= rol64(z, 52) ^ rol64(z, 21) ^ 0x9E3779B97F4A7C15UL;
z *= 0x81383173UL;
return z ^ z >> 28;
}
According to him it passes 32 TB of PractRand but with -tf unspecificed, defaulting to 1 instead of 2 which is what I use for all tests.
I was a bit too quick here and tested his function with RIGHT-rotates and not LEFT-rotates which is what Ettinger used. The table below is now updated to reflect a correct implementation of his mixer.
The performace w.r.t. RR-64-40-TF2-0.94, compared to my incorrect implementation, is about the same. 122 failed tests out of 128:
uint64_t z = (v ^ 0xDB4F0B9175AE2165UL) * 0x4823A80B2006E21BUL;
z ^= rol64(z, 52) ^ rol64(z, 21) ^ 0x9E3779B97F4A7C15UL;
z *= 0x81383173UL;
return z ^ z >> 28;
}
Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 40 | 16 | 16 | 15 | 14 | 15 | 16 | 16 | 15 | 12 | 12 | 13 | 14 | 15 | 14 | 15 |
16 | 15 | 16 | 15 | 16 | 16 | 17 | 18 | 19 | 17 | 17 | 17 | 17 | 18 | 17 | 17 | 18 |
32 | 19 | 20 | 21 | 23 | 24 | 25 | 26 | 27 | 28 | 28 | 30 | 30 | 32 | 24 | 19 | 16 |
48 | 16 | 19 | 19 | 19 | 22 | 22 | 24 | 31 | 40 | 40 | 40 | 40 | 36 | 34 | 40 | 40 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 12 | 13 | 15 | 14 | 15 | 15 | 15 | 15 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 17 |
16 | 17 | 18 | 19 | 19 | 22 | 20 | 20 | 17 | 23 | 25 | 25 | 25 | 26 | 30 | 30 | 29 |
32 | 30 | 34 | 33 | 36 | 21 | 19 | 17 | 16 | 17 | 16 | 16 | 17 | 21 | 21 | 19 | 19 |
48 | 22 | 22 | 21 | 20 | 21 | 20 | 19 | 20 | 19 | 18 | 16 | 16 | 16 | 15 | 15 | 14 |
Changing the rotation constants to {12, 43} (corresponding to right rotates instead of left rotatets) gives a function with about the same number of failures, 120 out of 128:
Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 40 | 15 | 15 | 14 | 15 | 15 | 16 | 16 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 18 | 18 | 17 | 18 | 17 | 18 | 19 | 19 | 18 |
32 | 17 | 18 | 19 | 21 | 23 | 19 | 17 | 17 | 19 | 19 | 19 | 21 | 24 | 28 | 31 | 35 |
48 | 36 | 37 | 38 | 34 | 36 | 36 | 35 | 38 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 16 | 16 | 16 | 17 | 16 | 17 | 17 | 17 | 16 |
16 | 18 | 17 | 17 | 16 | 24 | 23 | 22 | 21 | 24 | 25 | 25 | 22 | 19 | 17 | 19 | 17 |
32 | 16 | 16 | 17 | 19 | 19 | 19 | 21 | 22 | 25 | 27 | 26 | 25 | 24 | 23 | 22 | 21 |
48 | 20 | 19 | 18 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 16 | 16 | 15 | 15 |
Testing rrmxmx
I already knew that rrmxmx fails PractRand -tf 2 -te 1 after $2^{44}$ bytes when fed a unitary counter, starting at 0. If I recall correctly that took about one week of computing time to find out. The table below took about 20 minutes to generate and it clearly shows that rrmxmx is far from optimal (note that x = 33, quick and dirty):Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 25 | 25 | 33 | 33 | 33 | 33 | 33 |
16 | 33 | 28 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
32 | 33 | 31 | 33 | 33 | 33 | 33 | 30 | 30 | 30 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
48 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 31 | 28 | 31 | 33 | 33 | 33 |
16 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 28 | 33 | 33 | 33 | 33 |
32 | 28 | 28 | 30 | 25 | 22 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
48 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
Intermediate result, my best so far
Compared to my previous function, rrmxmx, this is much more well behaved and has passed 128 TB of PractRand -tf 2 with a unitary counter (one single run, inputs $\{0, \ldots, 2^{44} - 1\}$).
Testing with RR-64-40-TF2-0.94, there is 1 failure:
Trying to fix this failure (ROR(reversed, 14)) has so far only given me variants with more failures or a single failure happening
in a different subtest with less than $2^{39}$ bytes.
Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
16 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
32 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
48 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 39 | 40 |
16 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
32 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
48 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
Compared to rrmxmx it is slightly slower, doing 2 rotations instead of the shift in the middle of rrmxmx. Temporarily, let's call it "rrxmrrxmsx_0" (in C):
I found this function a few months ago but have been hoping to find something of similar speed that doesn't fail any of the 128 subtests.
If I do find one, I'll post it here!
// Old mixer, my rrmxmx
uint64_t rrmxmx(uint64_t v) {
v ^= ror64(v, 49) ^ ror64(v, 24);
v *= 0x9FB21C651E98DF25L;
v ^= v >> 28;
v *= 0x9FB21C651E98DF25L;
return v ^ v >> 28;
}
// New mixer, "rrxmrrxmsx_0", failing one subtest of RR-64-40-TF2-0.94.
// With a unit counter starting at 0, it has passed 128 TB of
// PractRand 0.94 -tf 2 without anomalies found past 2 TB.
uint64_t rrxmrrxmsx_0(uint64_t v) {
v ^= ror64(v, 25) ^ ror64(v, 50);
v *= 0xA24BAED4963EE407UL;
v ^= ror64(v, 24) ^ ror64(v, 49);
v *= 9FB21C651E98DF25UL;
return v ^ v >> 28;
}
uint64_t rrmxmx(uint64_t v) {
v ^= ror64(v, 49) ^ ror64(v, 24);
v *= 0x9FB21C651E98DF25L;
v ^= v >> 28;
v *= 0x9FB21C651E98DF25L;
return v ^ v >> 28;
}
// New mixer, "rrxmrrxmsx_0", failing one subtest of RR-64-40-TF2-0.94.
// With a unit counter starting at 0, it has passed 128 TB of
// PractRand 0.94 -tf 2 without anomalies found past 2 TB.
uint64_t rrxmrrxmsx_0(uint64_t v) {
v ^= ror64(v, 25) ^ ror64(v, 50);
v *= 0xA24BAED4963EE407UL;
v ^= ror64(v, 24) ^ ror64(v, 49);
v *= 9FB21C651E98DF25UL;
return v ^ v >> 28;
}
I kind of gave up on finding better constants for functions having the same form as the original rrmxmx. To check that there really is a reasonable correlation between "quite good behaviour" when feeding the function with a unitary counter and not failing many subtests, I am running RR-64-40-TF2-0.94 on it. It fails much much less badly than MurmurHash3 and Variant13 but still much worse than rrxmrrxmxs.