An intermediate result (much stronger mixer).
A frustrating search.
When working on finding good mixers, I got really frustrated by the extremely slow process of testing a new set of constants or other tweaks to functions that were "kind of good".My criterion for "success" was that the output from a simple counter, starting at 0, passed to the mixing function, and then taking the output from the mixing function and feeding it to PractRand with -tf 2 should not generate any failures up to $2^{45}$ bytes.
Eventually, I came to realize that what I really wanted was not just a function that would satisfy the "generate PractRand-passing output when fed a unitary counter" but rather "a decent approximation of a random permutation". Such a permutation would give a higgledy-piggledy sequence from a very regular one, regardless of what the regularity looks like (just kidding; the regularities of interest are affine transformations, of course there are many other sequences one could think of as "regular"). Previosuly, I had been testing with feeding gray-code sequences to the mixing function, hoping that the unit Hamming distance between adjacent inputs would reveal flaws more quickly. For the functions I tested, I couldn't find that it made any difference w.r.t. how quickly testing failed.
As I was thinking about more sequences that would be "very well ordered" I came up with a very simple testing procedure that seems to work well: A function that pass the subtests described below is also very likely to work well on any kind of (affine) counter update.
The fact that many of the subtest input sequences intersect does not affect the tests since they are run independently.
Test procedure
Instead of testing the full output, which is very slow, test many short output sequences in the hopes that anomalies in the mixer will be revealed more quickly. There's quite some conjecturing here: the (lack of) failures on many short tests hopefully correlate with overall quality. If the mixer is good, this is also a very lengthy process (the way I have been doing it, 128 TB of data will be tested if there are no failures) but running many small tests (1 TB a piece) can be done in parallell on a machine with many cores but without a using a lot of RAM.
Naming
To avoid confusion, let's call this test procedure "RR-b-x-TF2-0.94". "RR" to indicate "rotate and bit reversal", "b" for the word size in input/output bits, x for the tlmax-exponent, TF2 to indicate that PractRand was run with "-tf 2" and "0.94" for the version of PractRand I am currently using. The same procedure can be used for any permutation of size $2^{8b}$. Pseudo-code for RR-b-x-TF2-0.94:
# The tests I have run have been with b = 64 and x = tlmax = 40,
# i.e. 1 TB per subtest.
# rfun is either the identity function or a b-bit bit-order reversal function.
for rfun in {identity, bitreversal}
# Rotate counter 0 to (b - 1) bits.
for r in {0, ..., b - 1}
# No failure seen for this rfun & rotation so far.
failure = 0
# 2^x bytes, 2^x / b * 8 b-bit words.
for ctr in {0, ..., (2^tlmax / b * 8) - 1}
# Feed PractRand, break on failure.
# RORb is a b-bit right rotate function.
if feedPractRandFails(RORb(rfun(ctr), r))
# Note the failure level.
failure = ceil(log_2(ctr))
break for
end if
end for
emitTestScore(rfun, r, failure)
end for
end for
# i.e. 1 TB per subtest.
# rfun is either the identity function or a b-bit bit-order reversal function.
for rfun in {identity, bitreversal}
# Rotate counter 0 to (b - 1) bits.
for r in {0, ..., b - 1}
# No failure seen for this rfun & rotation so far.
failure = 0
# 2^x bytes, 2^x / b * 8 b-bit words.
for ctr in {0, ..., (2^tlmax / b * 8) - 1}
# Feed PractRand, break on failure.
# RORb is a b-bit right rotate function.
if feedPractRandFails(RORb(rfun(ctr), r))
# Note the failure level.
failure = ceil(log_2(ctr))
break for
end if
end for
emitTestScore(rfun, r, failure)
end for
end for
Test results
MurmurHash3 finalizer
RR-64-40-TF2-0.94 fails the MurmurHash3 finalizer quickly. This is to be expected considering that it is known to fail quickly when fed a unitary counter.Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 17 | 18 | 18 | 18 | 17 | 16 | 16 | 16 | 16 | 15 | 15 | 17 | 15 | 14 | 15 | 15 |
16 | 14 | 14 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 15 | 16 | 16 | 16 | 16 | 15 | 15 |
32 | 16 | 16 | 17 | 17 | 16 | 16 | 14 | 14 | 16 | 15 | 14 | 15 | 15 | 15 | 15 | 15 |
48 | 15 | 14 | 14 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 15 | 16 | 17 | 17 | 17 | 17 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 15 | 17 | 18 | 18 | 18 | 17 | 17 | 16 | 16 | 14 | 14 | 14 | 14 | 14 | 14 | 17 |
16 | 17 | 17 | 16 | 17 | 17 | 17 | 17 | 18 | 19 | 17 | 18 | 18 | 17 | 16 | 14 | 14 |
32 | 15 | 16 | 17 | 18 | 18 | 17 | 17 | 17 | 17 | 16 | 14 | 14 | 14 | 14 | 15 | 15 |
48 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 18 | 18 | 19 | 19 | 18 | 18 | 18 | 17 | 16 |
MurmurHash3, Variant 13
RR-64-40-TF2-0.94 fails Variant 13 quickly too. This is also to be expected considering that it is known to fail almost as fast as the MurmurHash3 finalizer.Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 19 | 17 | 18 | 18 | 18 | 17 | 18 | 18 | 18 | 18 | 18 | 19 | 19 | 16 | 17 | 17 |
16 | 17 | 17 | 16 | 17 | 16 | 16 | 17 | 17 | 16 | 17 | 16 | 17 | 17 | 17 | 18 | 18 |
32 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 20 | 20 | 19 | 20 | 19 | 19 |
48 | 17 | 17 | 16 | 17 | 17 | 17 | 16 | 17 | 16 | 17 | 17 | 17 | 18 | 18 | 19 | 19 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 16 | 17 | 18 | 18 | 18 | 19 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 20 | 20 | 19 |
16 | 20 | 19 | 21 | 18 | 19 | 21 | 19 | 22 | 20 | 21 | 22 | 21 | 20 | 22 | 22 | 19 |
32 | 18 | 18 | 18 | 19 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 20 | 20 | 20 | 20 | 19 |
48 | 21 | 18 | 18 | 21 | 19 | 22 | 20 | 21 | 19 | 18 | 18 | 18 | 18 | 18 | 18 | 18 |
Ettinger's mixer
Tommy Ettinger came up with a quite different construction, compared to MurmurHash3 (and variants) and my own rrmxmx (in C):
uint64_t ettingerMixer(uint64_t v) {
uint64_t z = (v ^ 0xDB4F0B9175AE2165UL) * 0x4823A80B2006E21BUL;
z ^= rol64(z, 52) ^ rol64(z, 21) ^ 0x9E3779B97F4A7C15UL;
z *= 0x81383173UL;
return z ^ z >> 28;
}
According to him it passes 32 TB of PractRand but with -tf unspecificed, defaulting to 1 instead of 2 which is what I use for all tests.
I was a bit too quick here and tested his function with RIGHT-rotates and not LEFT-rotates which is what Ettinger used. The table below is now updated to reflect a correct implementation of his mixer.
The performace w.r.t. RR-64-40-TF2-0.94, compared to my incorrect implementation, is about the same. 122 failed tests out of 128:
uint64_t z = (v ^ 0xDB4F0B9175AE2165UL) * 0x4823A80B2006E21BUL;
z ^= rol64(z, 52) ^ rol64(z, 21) ^ 0x9E3779B97F4A7C15UL;
z *= 0x81383173UL;
return z ^ z >> 28;
}
Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 40 | 16 | 16 | 15 | 14 | 15 | 16 | 16 | 15 | 12 | 12 | 13 | 14 | 15 | 14 | 15 |
16 | 15 | 16 | 15 | 16 | 16 | 17 | 18 | 19 | 17 | 17 | 17 | 17 | 18 | 17 | 17 | 18 |
32 | 19 | 20 | 21 | 23 | 24 | 25 | 26 | 27 | 28 | 28 | 30 | 30 | 32 | 24 | 19 | 16 |
48 | 16 | 19 | 19 | 19 | 22 | 22 | 24 | 31 | 40 | 40 | 40 | 40 | 36 | 34 | 40 | 40 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 12 | 13 | 15 | 14 | 15 | 15 | 15 | 15 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 17 |
16 | 17 | 18 | 19 | 19 | 22 | 20 | 20 | 17 | 23 | 25 | 25 | 25 | 26 | 30 | 30 | 29 |
32 | 30 | 34 | 33 | 36 | 21 | 19 | 17 | 16 | 17 | 16 | 16 | 17 | 21 | 21 | 19 | 19 |
48 | 22 | 22 | 21 | 20 | 21 | 20 | 19 | 20 | 19 | 18 | 16 | 16 | 16 | 15 | 15 | 14 |
Changing the rotation constants to {12, 43} (corresponding to right rotates instead of left rotatets) gives a function with about the same number of failures, 120 out of 128:
Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 40 | 15 | 15 | 14 | 15 | 15 | 16 | 16 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
16 | 16 | 16 | 16 | 16 | 16 | 16 | 16 | 18 | 18 | 17 | 18 | 17 | 18 | 19 | 19 | 18 |
32 | 17 | 18 | 19 | 21 | 23 | 19 | 17 | 17 | 19 | 19 | 19 | 21 | 24 | 28 | 31 | 35 |
48 | 36 | 37 | 38 | 34 | 36 | 36 | 35 | 38 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 16 | 16 | 16 | 17 | 16 | 17 | 17 | 17 | 16 |
16 | 18 | 17 | 17 | 16 | 24 | 23 | 22 | 21 | 24 | 25 | 25 | 22 | 19 | 17 | 19 | 17 |
32 | 16 | 16 | 17 | 19 | 19 | 19 | 21 | 22 | 25 | 27 | 26 | 25 | 24 | 23 | 22 | 21 |
48 | 20 | 19 | 18 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 16 | 16 | 15 | 15 |
Testing rrmxmx
I already knew that rrmxmx fails PractRand -tf 2 -te 1 after $2^{44}$ bytes when fed a unitary counter, starting at 0. If I recall correctly that took about one week of computing time to find out. The table below took about 20 minutes to generate and it clearly shows that rrmxmx is far from optimal (note that x = 33, quick and dirty):Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 25 | 25 | 33 | 33 | 33 | 33 | 33 |
16 | 33 | 28 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
32 | 33 | 31 | 33 | 33 | 33 | 33 | 30 | 30 | 30 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
48 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 31 | 28 | 31 | 33 | 33 | 33 |
16 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 28 | 33 | 33 | 33 | 33 |
32 | 28 | 28 | 30 | 25 | 22 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
48 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
Intermediate result, my best so far
Compared to my previous function, rrmxmx, this is much more well behaved and has passed 128 TB of PractRand -tf 2 with a unitary counter (one single run, inputs $\{0, \ldots, 2^{44} - 1\}$).
Testing with RR-64-40-TF2-0.94, there is 1 failure:
Trying to fix this failure (ROR(reversed, 14)) has so far only given me variants with more failures or a single failure happening
in a different subtest with less than $2^{39}$ bytes.
Offset | IDENTIY | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
16 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
32 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
48 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
Offset | REVERSE | |||||||||||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 39 | 40 |
16 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
32 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
48 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
Compared to rrmxmx it is slightly slower, doing 2 rotations instead of the shift in the middle of rrmxmx. Temporarily, let's call it "rrxmrrxmsx_0" (in C):
I found this function a few months ago but have been hoping to find something of similar speed that doesn't fail any of the 128 subtests.
If I do find one, I'll post it here!
// Old mixer, my rrmxmx
uint64_t rrmxmx(uint64_t v) {
v ^= ror64(v, 49) ^ ror64(v, 24);
v *= 0x9FB21C651E98DF25L;
v ^= v >> 28;
v *= 0x9FB21C651E98DF25L;
return v ^ v >> 28;
}
// New mixer, "rrxmrrxmsx_0", failing one subtest of RR-64-40-TF2-0.94.
// With a unit counter starting at 0, it has passed 128 TB of
// PractRand 0.94 -tf 2 without anomalies found past 2 TB.
uint64_t rrxmrrxmsx_0(uint64_t v) {
v ^= ror64(v, 25) ^ ror64(v, 50);
v *= 0xA24BAED4963EE407UL;
v ^= ror64(v, 24) ^ ror64(v, 49);
v *= 9FB21C651E98DF25UL;
return v ^ v >> 28;
}
uint64_t rrmxmx(uint64_t v) {
v ^= ror64(v, 49) ^ ror64(v, 24);
v *= 0x9FB21C651E98DF25L;
v ^= v >> 28;
v *= 0x9FB21C651E98DF25L;
return v ^ v >> 28;
}
// New mixer, "rrxmrrxmsx_0", failing one subtest of RR-64-40-TF2-0.94.
// With a unit counter starting at 0, it has passed 128 TB of
// PractRand 0.94 -tf 2 without anomalies found past 2 TB.
uint64_t rrxmrrxmsx_0(uint64_t v) {
v ^= ror64(v, 25) ^ ror64(v, 50);
v *= 0xA24BAED4963EE407UL;
v ^= ror64(v, 24) ^ ror64(v, 49);
v *= 9FB21C651E98DF25UL;
return v ^ v >> 28;
}
I kind of gave up on finding better constants for functions having the same form as the original rrmxmx. To check that there really is a reasonable correlation between "quite good behaviour" when feeding the function with a unitary counter and not failing many subtests, I am running RR-64-40-TF2-0.94 on it. It fails much much less badly than MurmurHash3 and Variant13 but still much worse than rrxmrrxmxs.
Thanks for testing my code (or maybe something like it), and I'm glad it at least does better than MurmurHash. You may have made this correction in your tested code, but I used left rotations instead of right rotations with the given amounts. Did the code you tested look like this, or were the rotations reversed in direction?
SvaraRaderauint64_t ettingerMixer(uint64_t v) {
uint64_t z = (v ^ 0xDB4F0B9175AE2165UL) * 0x4823A80B2006E21BUL;
z ^= (z << 52 | z >> 12) ^ (z << 21 | z >> 43) ^ 0x9E3779B97F4A7C15UL;
z *= 0x81383173UL;
return z ^ z >> 28;
}
I'm testing this with right rotations now and it seems a little weaker (PractRand found a Gap-16:B anomaly on the lowest bit at 8GB of tests), but hasn't failed at 128GB, which has me rather surprised. I wonder how rrxmrrxmsx (this hash needs a better name, heh) does with left rotations instead of right rotations, or just changing the right rotations to 39, 14, 40, and 15. If they both do well, then that may suggest the weight of the rotation matters in some key way.
RaderaMy very bad: I did indeed reverse the rotations, doing right rotates instead of left rotates as in your original. There was yet another bug in my version of your code (most likely affecting the test to fail less than they should have). I am currently re-running the tests with a (hopefully) correct version as of now. The results are about the same, 119 test failed out of 119 so far.
RaderaThe rotate amounts do matter. I haven't tried all neighbours (adjacent ror amounts) in "rrxmrrxmsx" but the ones I have tried performed worse, in some case much worse. None of the adjacent rors or multipliers I have tested perform as well as the constants above.
If I had a server park to run my tests on, it would of course be interesting to not stop each subtest at 1TB but 4TB or so.
Something else I have not done either, since I am fairly certain what the outcome would be, is to run a reduced round 64-bit block cipher (Blowfish & RC5 are the candidates I thought of) through the test and see where the failures occur as a function of the number of rounds used.
Oh yeah, the rotate amounts definitely matter, but I wonder if there's a symmetric aspect to ror64(num, r) vs. ror64(num, 64 - r). At least in my hash (even though it's known to have issues with most rotations of the input), changing the implementation to rotate right instead of left with the same constants doesn't seem to cause failures. My test of the right-rotate version is at 8 Terabytes now with only the same one anomaly mentioned at 8 Gigabytes. If rrxmrrxmsx doesn't have a symmetric aspect to its rotations, that might be a clue that some other part of my hash has a different property regarding rotation amounts that might be useful. I'm curious if the 119 failed are the same ones that failed the first time, with right rotates?
RaderaIt looks like the initial XOR I used is not enough of a scrambler for any input patterns other than adding 1, so there's probably not much point in testing the hash I gave more. I'll try to set up some kind of system to automate the rotation/reversal of inputs to other hashes, most likely variants on rrxmrrxmsx, and test on my end.
So far, I have found no particularly strong strategy for choosing rotation constants. The MO has been along the tedious lines of "conjecture some, run a lot of test, check outcome, repeat".
RaderaWhen I did construct rrmxmx there was a particular search strategy (good avalanche properties, up to and including 4th order SAC) and some sheer luck; replacing the rotation constants and/or the multiplicative constants with anything else I have found has given me worse properties.
The same goes for rrxmrrxmsx. :-/
But I guess *something* could be said regarding the rotations:
If chosen too low, the most precious (well mixed) high order bits from the multiplication won't propagate to the lowest order bits.
If chosen too high, very few of the highest order bits will affect the lowest order bits. So some kind of balance has to be struck.
To make searching even worse, there is a strong correlation between the rotations chosen and the multiplications chosen. And worse^2: the quality is sensitive to the order the permutations are executed.
Say you have
P(v, r_1, r_2, m) = (v ^ ror64(v, r_1) ^ ror64(v, r_2)) * m
Now P(P(v, a_1, a_2, m_1), b_1, b_2, m_2) might perform very differently to P(P(v, b_1, b_2, m_2), a_1, a_2, m_1). Differently in the sense "the first sequence might perform very well but the second sequence just so-so".
The initial xor doesn't hurt, it's kind of nice to avoid the trivial fix point at 0, but it doesn't help all that much either.
The generator program I wrote is flexible enough to handle xor-shifts at any point as well as additive constants in any step. But so far I have not seen any case where addition (not xoring) of a constant has improved the statistical properties.
rrmxmx and rrxmrrxmxs are only different run time parameters to that program. When running PractRand, the generator's speed is not the bottle neck anyway. In a benchmarking/deployment situation, one would of course fix all constants and the order they are used.
Please get in touch with me by Facebook Messenger (AFAIK, my name is unique) if you want to discuss this further, I am somewhat reluctant to write about too many failed roads taken and things that are a bit too much "works in progress". :-)
(The right instead of left rotation was not the only bug; I also accidentally kept your counter increment. For any rotation but 0 this made the LSB set to 1, always, which may or may not have influenced the results. I am re-running the 128 tests now and will post the result above since I do make a reference to them.)
Fantastic blog describing your work developing and testing the rrmxmx and the new variant rrxmrrxmsx_0 algorithms.
SvaraRaderaThe code is freely produced on the blog but there is no licence specified. Is this openly available for reuse?
Thanking you in advance.
Alex
Glad that you find it useful!
RaderaTommy Ettinger has since came up with better constants which you might be interested in:
https://github.com/tommyettinger/sarong/blob/master/src/main/java/sarong/PelicanRNG.java
Other things have kept me busy as of late but I am currently running tests on yet another tweak that might or might not be marginally slower (or faster) than rrxmrrxmsx. It will probably be another week or two before the tests have finished ("RR-64-41-TF2-0.94", using the nomenclature above, that is testing each tweak for 2TB of data).
Consider the code above to be in the public domain. I would of course appreciate if I was credited (if possible) and to hear of any use you put it to.
Hi Mr. Evensen,
SvaraRaderagreetings from a hashophile from Balkania.
You are welcome to see, comment and whatever you want on:
https://github.com/rurban/smhasher/issues/73#issuecomment-543944019
Thanks for sharing your mixer and cooking methodology, wonder what corpus of keys can be torturous to my 'Pippip' , can you share your keys feed, is it some kind of a simple loop?
My dream, for so long, is to have all English n-grams up to n=1..9 at my fingers, easily searchable, you can find something useful to you on TFD:
https://forum.thefreedictionary.com/postsm1118964_MASAKARI--The-people-s-choice--General-Purpose-Grade--English-wordlist.aspx#1118964
Since they are tens of billions, I need good distribution and superb speed.
Thanks a lot for your latest blog posts! I was currently searching for a good mixer for my own hashtable implementation, and your latest two posts inspired me to implement your testing protocol in an open source c++ project: https://github.com/martinus/better-faster-stronger-mixer
SvaraRaderaMy goal in this repository is to collect all kinds of mixer implementations, and provide benchmarks and practrand results. I have have added your rrmxmx and rrxmrrxmsx_0 there as well, and created a graph with the most important results.
Note that I believe i have stumbled upon a very interesting mixer, that I've called mumxmumxx2. It passes all practrand 0.95 tests with -tf 2 -tlmin 10 -tlmax 40, and is on my machine a bit faster than rrxmrrxmsx_0. Here is the source: https://github.com/martinus/better-faster-stronger-mixer/blob/master/include/mixer/mumxmumxx2.h#L8-L13
The "mumx" operation performs a 128bit multiplication, and xor's the upper and lower 64bit results.
You have mail. :-)
RaderaSorry for not posting at a more regular rate. Great to see that anyone but me is interested in these matters.
I don't quite see how mumxmumxx2 and mumxmumxx3 are bijective? The 64 least significant bits are but when you fold back the 64 MSB I don't think you have a bijection anymore. Is there some dark magic in the constants chosen?
I did a small experiment, implementing mumx but for 64-bit multiplies of 32-bit words, folding the top half back on the lower half.
The coverage after trying 8 different (random odd) multipliers was between 63.12% and 63.38% ($2^{32}$ distinct inputs mapped to ~2.71828E9 outputs (seems awfully close to $10^9e$, doesn't it).
Whipping up a not particularly fast pseudo-random function $f: \{0,1\}^{32} \to \{0,1\}^{32}$ and running the same test gave me numbers in the same ballpark. So multiplying and folding back gives the same distribution properties as one would expect from a pseudo-random function but not a pseudo-random *permutation*.
To be honest I did not really check if these functions are bijective. You are of course right, they are not. I am mainly interested in fast hashing algorithm for hashmap, and while it would be a good feature to have I don't believe it is strictly necessary for good performance.
RaderaI'll try to add tests to my code to add coverage calculation for the 32bit variants. Interestingly, for mumxmumxx2 I also get ~63.2% coverage (same as just for one mumx call), and e.g. wyhash3 only has ~48.3% coverage.
Depending on how you use it, it might be a fairly large problem.
RaderaThis construction is be problematic unless the function is guaranteed to be bijective:
uint64_t acc = 0;
for(int i = 0; i < WORDS_TO_HASH; i++) {
// acc now only maps to a fraction of the full state space.
acc = mixer(acc + word[i]);
}
return acc;
as is this one (which avoids one problem but is about as bad):
uint64_t acc = 0;
for(int i = 0; i < WORDS_TO_HASH; i++) {
// This way, acc can cover the full state space but...
acc = mixer(acc) + word[i];
}
// ... all bits in all words do not affect acc.
// A final round is needed...
acc = mix(acc);
// But now we're back to the problem we tried to deal with;
// acc can not assume all 2^64 values.
Having the mixer be bijective makes analysis easier and equidistribution possible.