The testing procedure I've used is described in an earlier post. There is one extension, compared to the earlier post: every rotation and reversal is also complemented for a total of 256 subtests. I'll call the extended version "RRC-64-40-TF2-0.94" for "Reversed, rotated, complemented-64 bits, 2^40 bytes, PractRand 0.94 with -tf 2". Without further ado, here's "Moremur", a statistically stronger but equally fast mixer:
uint64_t moremur(uint64_t x) {
x ^= x >> 27;
x *= 0x3C79AC492BA7B653UL;
x ^= x >> 33;
x *= 0x1C69B3F74AC4AE35UL;
x ^= x >> 27;
return x;
}
One strong flaw with this construction, i.e. the first operation being an xor-shift, is that there are many increments that will lead to few or none of the less significant bits being one.
This in turn implies that the less significant bits will be zero after the first two operations, xor-shift and multiply.
Particularly bad increments have this form: $\gamma = (2^s + 1)a, a \lt 2^s$, $s$ being the first shift and $a$ being sparse.
x ^= x >> 27;
x *= 0x3C79AC492BA7B653UL;
x ^= x >> 33;
x *= 0x1C69B3F74AC4AE35UL;
x ^= x >> 27;
return x;
}
In the table below, Moremur outperforms Murmur3 and Variant13 by quite some margin.
$\gamma$ | Murmur3 | Variant13 | Moremur |
---|---|---|---|
0x0000000000000001 |
$2^{17}$ | $2^{19}$ | $2^{33}$ |
0x0000000000000003 |
$2^{17}$ | $2^{17}$ | $2^{34}$ |
0x0000000000000005 |
$2^{17}$ | $2^{18}$ | $2^{34}$ |
0x0000000000000009 |
$2^{16}$ | $2^{18}$ | $2^{34}$ |
0x0000010000000001 |
$2^{18}$ | $2^{21}$ | $2^{41}$ |
0xffffffffffffffff |
$2^{17}$ | $2^{19}$ | $2^{33}$ |
0x0000000000ffffff |
$2^{17}$ | $2^{23}$ | $2^{37}$ |
0xffffff0000000001 |
$2^{18}$ | $2^{23}$ | $2^{43}$ |
0x0000000000555555 |
$2^{18}$ | $2^{28}$ | $2^{42}$ |
0x1111111111110001 |
$2^{25}$ | $2^{30}$ | $> 2^{46}$ |
0x7777777777770001 |
$2^{28}$ | $2^{34}$ | $> 2^{45}$ |
0x7f7f7f7f33333333 |
$2^{30}$ | $2^{33}$ | $> 2^{46}$ |
0x5555550000000001 |
$2^{21}$ | $2^{27}$ | $2^{45}$ |
0xc45a11730cc8ffe3 |
$2^{39}$ | $> 2^{42}$ | $2^{?}$ |
0x2b13b77d0b289bbd |
$2^{39}$ | $> 2^{42}$ | $2^{?}$ |
0x40ead42ca1cd0131 |
$2^{40}$ | $> 2^{42}$ | $2^{?}$ |
ror64(reverse(x, R) ^ |
FORWARD | REVERSED | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 33 | 19 | 19 | 19 | 22 | 22 | 24 | 25 | 28 | 27 | 28 | 29 | 16 | 16 | 16 | 17 | 16 | 17 | 19 | 19 | 20 | 28 | 27 | 28 | 34 | 29 | 29 | 35 | 36 | 37 | 37 | 38 | |
16 | 17 | 17 | 18 | 19 | 19 | 19 | 19 | 20 | 21 | 22 | 23 | 23 | 24 | 25 | 26 | 27 | 36 | 35 | 35 | 39 | 40 | 34 | 31 | 31 | 39 | 40 | 39 | 38 | 37 | 34 | 34 | 36 | |
32 | 28 | 29 | 30 | 31 | 31 | 32 | 33 | 33 | 34 | 34 | 34 | 34 | 34 | 33 | 32 | 32 | 28 | 34 | 33 | 32 | 32 | 31 | 33 | 33 | 34 | 34 | 34 | 35 | 35 | 36 | 37 | 36 | |
48 | 32 | 28 | 28 | 28 | 29 | 29 | 29 | 29 | 30 | 30 | 30 | 30 | 31 | 31 | 31 | 32 | 31 | 31 | 31 | 33 | 36 | 37 | 36 | 28 | 33 | 34 | 31 | 31 | 28 | 25 | 22 | 19 | |
ror64(reverse(x, R) ^ |
FORWARD | REVERSED | |||||||||||||||||||||||||||||||
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 33 | 19 | 19 | 19 | 22 | 22 | 24 | 25 | 28 | 29 | 18 | 28 | 15 | 16 | 16 | 17 | 16 | 17 | 19 | 21 | 21 | 26 | 28 | 30 | 34 | 31 | 34 | 34 | 36 | 36 | 37 | 37 | |
16 | 17 | 18 | 18 | 19 | 19 | 19 | 19 | 19 | 21 | 22 | 23 | 24 | 24 | 25 | 26 | 27 | 36 | 35 | 35 | 39 | 40 | 34 | 31 | 31 | 39 | 40 | 39 | 38 | 37 | 34 | 34 | 36 | |
32 | 28 | 29 | 30 | 30 | 32 | 33 | 33 | 34 | 34 | 35 | 35 | 34 | 34 | 32 | 33 | 32 | 28 | 34 | 33 | 32 | 32 | 31 | 33 | 33 | 34 | 34 | 34 | 35 | 36 | 36 | 36 | 36 | |
48 | 32 | 28 | 28 | 28 | 28 | 28 | 29 | 29 | 29 | 30 | 30 | 30 | 31 | 31 | 32 | 32 | 31 | 31 | 31 | 33 | 35 | 35 | 34 | 28 | 32 | 32 | 31 | 30 | 28 | 25 | 22 | 19 |
ror64(reverse(x, R) ^ |
FORWARD | REVERSED | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 17 | 18 | 18 | 18 | 17 | 16 | 16 | 16 | 16 | 15 | 15 | 17 | 15 | 14 | 15 | 15 | 15 | 17 | 18 | 18 | 18 | 17 | 17 | 16 | 16 | 14 | 14 | 14 | 14 | 14 | 14 | 17 | |
16 | 14 | 14 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 15 | 16 | 16 | 16 | 16 | 15 | 15 | 17 | 17 | 16 | 17 | 17 | 17 | 17 | 18 | 19 | 17 | 18 | 18 | 17 | 16 | 14 | 14 | |
32 | 16 | 16 | 17 | 17 | 16 | 16 | 14 | 14 | 16 | 15 | 14 | 15 | 15 | 15 | 15 | 15 | 15 | 16 | 17 | 18 | 18 | 17 | 17 | 17 | 17 | 16 | 14 | 14 | 14 | 14 | 15 | 15 | |
48 | 15 | 14 | 14 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 15 | 16 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 18 | 18 | 19 | 19 | 18 | 18 | 18 | 17 | 16 | |
ror64(reverse(x, R) ^ |
FORWARD | REVERSED | |||||||||||||||||||||||||||||||
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 17 | 17 | 18 | 17 | 17 | 15 | 16 | 16 | 16 | 15 | 15 | 17 | 14 | 15 | 15 | 15 | 14 | 17 | 18 | 18 | 17 | 17 | 16 | 17 | 16 | 15 | 14 | 14 | 14 | 14 | 16 | 17 | |
16 | 14 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 15 | 15 | 16 | 16 | 16 | 16 | 15 | 15 | 17 | 17 | 16 | 17 | 17 | 17 | 17 | 18 | 18 | 18 | 17 | 16 | 15 | 15 | 14 | 14 | |
32 | 16 | 17 | 17 | 17 | 16 | 14 | 14 | 14 | 15 | 15 | 14 | 15 | 14 | 15 | 15 | 14 | 15 | 15 | 17 | 18 | 18 | 17 | 17 | 17 | 17 | 16 | 15 | 14 | 14 | 14 | 14 | 15 | |
48 | 14 | 14 | 14 | 14 | 14 | 14 | 15 | 15 | 15 | 15 | 15 | 16 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 16 | 17 | 17 | 17 | 17 | 18 | 19 | 19 | 18 | 18 | 18 | 17 | 15 |
ror64(reverse(x, R) ^ |
FORWARD | REVERSED | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 19 | 17 | 18 | 18 | 18 | 17 | 18 | 18 | 18 | 18 | 18 | 19 | 19 | 16 | 17 | 17 | 16 | 17 | 18 | 18 | 18 | 19 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | |
16 | 17 | 17 | 16 | 17 | 16 | 16 | 17 | 17 | 16 | 17 | 16 | 17 | 17 | 17 | 18 | 18 | 20 | 19 | 21 | 18 | 19 | 21 | 19 | 22 | 20 | 21 | 22 | 21 | 20 | 22 | 22 | 19 | |
32 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 20 | 20 | 19 | 20 | 19 | 19 | 18 | 18 | 18 | 19 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 20 | 20 | 20 | 20 | 19 | |
48 | 17 | 17 | 16 | 17 | 17 | 17 | 16 | 17 | 16 | 17 | 17 | 17 | 18 | 18 | 19 | 19 | 21 | 18 | 18 | 21 | 19 | 22 | 20 | 21 | 19 | 18 | 18 | 18 | 18 | 18 | 18 | 18 | |
ror64(reverse(x, R) ^ |
FORWARD | REVERSED | |||||||||||||||||||||||||||||||
Offset | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | 19 | 17 | 18 | 18 | 18 | 17 | 18 | 18 | 18 | 18 | 18 | 18 | 19 | 16 | 17 | 17 | 16 | 17 | 18 | 18 | 18 | 19 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | |
16 | 17 | 17 | 16 | 17 | 16 | 17 | 17 | 17 | 16 | 17 | 16 | 17 | 17 | 17 | 17 | 18 | 20 | 19 | 21 | 18 | 18 | 21 | 19 | 22 | 20 | 21 | 22 | 21 | 20 | 22 | 22 | 19 | |
32 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 18 | 18 | 18 | 19 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 20 | 20 | 19 | 20 | 19 | |
48 | 16 | 17 | 16 | 16 | 17 | 17 | 16 | 17 | 16 | 17 | 17 | 17 | 18 | 18 | 19 | 19 | 21 | 19 | 19 | 21 | 19 | 22 | 20 | 21 | 19 | 18 | 18 | 18 | 18 | 18 | 18 | 18 |
Hej Pelle,
SvaraRaderaThank you for really good articles!
Here is another configuration that gets the same construction to 2^41 bytes for gamma=1, PractRand 0.94 -tf=2 -te=0:
uint64_t xmxmx(uint64_t x) {
x ^= x >> 27;
x *= 0xe9846af9b1a615dull;
x ^= x >> 25;
x *= 0xe9846af9b1a615dull;
x ^= x >> 27;
return x;
}
I've only tested it with a few other gammas which all have failed after 2^38 but I've had time to run it through your RRC scheme so I am not sure how it would hold up. Maybe you have some direct insight?
Found by method outlined here http://jonkagstrom.com/tuning-murmur3/index.html
Anyways, looking forward to read any future articles!
Best regards,
Jon
Ps. the constants in moremur have 'UL' suffix but I think you mean 'ULL' also the max PR under the matrices for Murmur3 and Variant13 seems to be off (says 2^40 for both).
I'm running RRC-64-40-TF2-0.94 on the constants you give above right now. So far it seems like they are about the same, performance-wise, as the ones I gave above.
RaderaAs for UL vs ULL, it was too long since I delved into any standard documents and ULL might mean uint128_t on some systems. For production code, one should of course ensure that the constants really are interpreted as 64-bit unsigned.
The maximum, 2^40, is actually correct for all tables. Murmur3 and Variant13 never reach anything close to 2^40 but that was the limit I chose to get the colours in the table to be the same, relatively speaking. Had I normalized/thresholded at, say, 2^25 or so, they would have been more yellow in colour.
Interesting, I guess it adds evidence of a possible limit for this construction, thank you for taking time and testing.
SvaraRaderaI read 'ul' as at least 32-bits while 'ull' as at least 64-bit (but I may be wrong).
Ahh that makes sense, I thought it was the maximum value in the corresponding matrix.
Maila mig på mangling@evensen.org, har en del kommentarer och svar på en del frågor du ställt på http://jonkagstrom.com/bit-mixer-construction/index.html
Radera/P
Den här kommentaren har tagits bort av bloggadministratören.
SvaraRaderaHow do you look for these numbers (multipliers and shifts)?
SvaraRaderaI noticed you use different multipliers and different shifts. Also multipliers are non-prime numbers. Would it be better if it did? I have syrup suspicion they would.
And, obviously, how to find then?
If I wanted to add another stage to those three (xmxmx) how would I look for multiplier and shift?