måndag 16 december 2019

Stronger, better, morer, Moremur; a better Murmur3-type mixer.

I'll keep this brief: The original constants used in MurmurHash3 and Stafford's Variant 13 are not as good as they could have been. This affects the strength of SplitMix64.

The testing procedure I've used is described in an earlier post. There is one extension, compared to the earlier post: every rotation and reversal is also complemented for a total of 256 subtests. I'll call the extended version "RRC-64-40-TF2-0.94" for "Reversed, rotated, complemented-64 bits, 2^40 bytes, PractRand 0.94 with -tf 2". Without further ado, here's "Moremur", a statistically stronger but equally fast mixer:

uint64_t moremur(uint64_t x) {
  x ^= x >> 27;
  x *= 0x3C79AC492BA7B653UL;
  x ^= x >> 33;
  x *= 0x1C69B3F74AC4AE35UL;
  x ^= x >> 27;

  return x;
}
One strong flaw with this construction, i.e. the first operation being an xor-shift, is that there are many increments that will lead to few or none of the less significant bits being one. This in turn implies that the less significant bits will be zero after the first two operations, xor-shift and multiply. Particularly bad increments have this form: $\gamma = (2^s + 1)a, a \lt 2^s$, $s$ being the first shift and $a$ being sparse.

In the table below, Moremur outperforms Murmur3 and Variant13 by quite some margin.

Table 1: Test length where PractRand fails
$\gamma$ Murmur3 Variant13 Moremur
0x0000000000000001
$2^{17}$ $2^{19}$ $2^{33}$
0x0000000000000003
$2^{17}$ $2^{17}$ $2^{34}$
0x0000000000000005
$2^{17}$ $2^{18}$ $2^{34}$
0x0000000000000009
$2^{16}$ $2^{18}$ $2^{34}$
0x0000010000000001
$2^{18}$ $2^{21}$ $2^{41}$
0xffffffffffffffff
$2^{17}$ $2^{19}$ $2^{33}$
0x0000000000ffffff
$2^{17}$ $2^{23}$ $2^{37}$
0xffffff0000000001
$2^{18}$ $2^{23}$ $2^{43}$
0x0000000000555555
$2^{18}$ $2^{28}$ $2^{42}$
0x1111111111110001
$2^{25}$ $2^{30}$ $> 2^{46}$
0x7777777777770001
$2^{28}$ $2^{34}$ $> 2^{45}$
0x7f7f7f7f33333333
$2^{30}$ $2^{33}$ $> 2^{46}$
0x5555550000000001
$2^{21}$ $2^{27}$ $2^{45}$
0xc45a11730cc8ffe3
$2^{39}$ $> 2^{42}$ $2^{?}$
0x2b13b77d0b289bbd
$2^{39}$ $> 2^{42}$ $2^{?}$
0x40ead42ca1cd0131
$2^{40}$ $> 2^{42}$ $2^{?}$
As can be seen from the tables below, these constants are a great improvement over the ones I've been aware of so far (original MurmurHash3 and Variant13). Moremur still fails RRC-64-40-TF2-0.94 but far from as badly as the earlier constants do.
Moremur failures, $2^{40}$ bytes maximum.
ror64(reverse(x, R) ^ 0x0000000000000000, r)
FORWARDREVERSED
Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 3319191922222425282728291616161716171919202827283429293536373738
16 1717181919191920212223232425262736353539403431313940393837343436
32 2829303131323333343434343433323228343332323133333434343535363736
48 3228282829292929303030303131313231313133363736283334313128252219
ror64(reverse(x, R) ^ 0xFFFFFFFFFFFFFFFF, r)
FORWARDREVERSED
Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 3319191922222425282918281516161716171921212628303431343436363737
16 1718181919191919212223242425262736353539403431313940393837343436
32 2829303032333334343535343432333228343332323133333434343536363636
48 3228282828282929293030303131323231313133353534283232313028252219
Murmur3 failures, $2^{40}$ bytes maximum.
ror64(reverse(x, R) ^ 0x0000000000000000, r)
FORWARDREVERSED
Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1718181817161616161515171514151515171818181717161614141414141417
16 1414141414151515151516161616151517171617171717181917181817161414
32 1616171716161414161514151515151515161718181717171716141414141515
48 1514141414141515151515161717171717171717171717181819191818181716
ror64(reverse(x, R) ^ 0xFFFFFFFFFFFFFFFF, r)
FORWARDREVERSED
Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1717181717151616161515171415151514171818171716171615141414141617
16 1414141415151515151516161616151517171617171717181818171615151414
32 1617171716141414151514151415151415151718181717171716151414141415
48 1414141414141515151515161717171717171716171717171819191818181715
Variant 13, $2^{40}$ bytes maximum.
ror64(reverse(x, R) ^ 0x0000000000000000, r)
FORWARDREVERSED
Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1917181818171818181818191916171716171818181919202019202019202019
16 1717161716161717161716171717181820192118192119222021222120222219
32 1919191919191919191920201920191918181819192020192020192020202019
48 1717161717171617161717171818191921181821192220211918181818181818
ror64(reverse(x, R) ^ 0xFFFFFFFFFFFFFFFF, r)
FORWARDREVERSED
Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 1917181818171818181818181916171716171818181919202019202019202019
16 1717161716171717161716171717171820192118182119222021222120222219
32 1919191919191919191920201920201918181819192020192020192020192019
48 1617161617171617161717171818191921191921192220211918181818181818
The constants were found through a stochastic hill climber I wrote using CUDA. The first testing I did of Murmur3 and Variant13 revealed that they had fairly weak higher order avalanche properties. Letting the fitness function include metrics for 1st to 4th order avalanche proved to be quite successful. The CUDA program was used for sieving; once some possibly good candidates turned up, I ran them through RRC-64-40-TF2-0.94. The constants above were the best I could find; the fitness function, combining several tests, was too insensitive to make any further improvements possible with the approach I used.

Conclusion

The shifts and constants above give a permutation that is statistically closer to a pseudo-random permutation than the ones used in SplitMix64, java.util.SplittableRandom and java.util.ThreadLocalRandom. In particular, the number of bad gammas/increments should be considerably fewer.

6 kommentarer:

  1. Hej Pelle,

    Thank you for really good articles!

    Here is another configuration that gets the same construction to 2^41 bytes for gamma=1, PractRand 0.94 -tf=2 -te=0:

    uint64_t xmxmx(uint64_t x) {
    x ^= x >> 27;
    x *= 0xe9846af9b1a615dull;
    x ^= x >> 25;
    x *= 0xe9846af9b1a615dull;
    x ^= x >> 27;
    return x;
    }

    I've only tested it with a few other gammas which all have failed after 2^38 but I've had time to run it through your RRC scheme so I am not sure how it would hold up. Maybe you have some direct insight?

    Found by method outlined here http://jonkagstrom.com/tuning-murmur3/index.html

    Anyways, looking forward to read any future articles!

    Best regards,

    Jon

    Ps. the constants in moremur have 'UL' suffix but I think you mean 'ULL' also the max PR under the matrices for Murmur3 and Variant13 seems to be off (says 2^40 for both).

    SvaraRadera
    Svar
    1. I'm running RRC-64-40-TF2-0.94 on the constants you give above right now. So far it seems like they are about the same, performance-wise, as the ones I gave above.

      As for UL vs ULL, it was too long since I delved into any standard documents and ULL might mean uint128_t on some systems. For production code, one should of course ensure that the constants really are interpreted as 64-bit unsigned.

      The maximum, 2^40, is actually correct for all tables. Murmur3 and Variant13 never reach anything close to 2^40 but that was the limit I chose to get the colours in the table to be the same, relatively speaking. Had I normalized/thresholded at, say, 2^25 or so, they would have been more yellow in colour.

      Radera
  2. Interesting, I guess it adds evidence of a possible limit for this construction, thank you for taking time and testing.

    I read 'ul' as at least 32-bits while 'ull' as at least 64-bit (but I may be wrong).

    Ahh that makes sense, I thought it was the maximum value in the corresponding matrix.

    SvaraRadera
    Svar
    1. Maila mig på mangling@evensen.org, har en del kommentarer och svar på en del frågor du ställt på http://jonkagstrom.com/bit-mixer-construction/index.html

      /P

      Radera
  3. Den här kommentaren har tagits bort av bloggadministratören.

    SvaraRadera
  4. How do you look for these numbers (multipliers and shifts)?
    I noticed you use different multipliers and different shifts. Also multipliers are non-prime numbers. Would it be better if it did? I have syrup suspicion they would.
    And, obviously, how to find then?
    If I wanted to add another stage to those three (xmxmx) how would I look for multiplier and shift?

    SvaraRadera

Applying the RRC-64-test to the degski64 and Lea64 mixers

In LXM: better splittable pseudorandom number generators (and almost as fast) by Guy L Steele and Sebastiano Vigna, approaches for pseud...