tag:blogger.com,1999:blog-3973020588456062013.post1619484469417855321..comments2023-10-19T11:01:02.359-07:00Comments on Mostly mangling: On the mixing functions in "Fast Splittable Pseudorandom Number Generators" , MurmurHash3 and David Stafford's improved variants on the MurmurHash3 finalizer.Pelle Evensenhttp://www.blogger.com/profile/03828317178444752816noreply@blogger.comBlogger14125tag:blogger.com,1999:blog-3973020588456062013.post-28996109149864497962023-01-24T14:29:27.782-08:002023-01-24T14:29:27.782-08:00Thank you very much! Yes, of course I wouldn't...Thank you very much! Yes, of course I wouldn't take anything without credit. I'm interested in applications like short string hashing and fast generation of pseudo-random values from randomized but not uniformly distributed sources. But nothing that requires gigabytes per second. For example, with strings like local file paths I was sure that MurmurHash3 will be more than good enough, but no, I've encountered more than a few collisions of 64-bit hashes. Which is, of course, always possible, but shouldn't be that common.<br />Oh, and before that I was sure FNV-1a will be more than enough, but had to upgrade to Murmur3 quite quickly :)<br /><br />P. S. I have now read every entry in your blog, looking forward for more, and thanks a lot for your work and for making it public,Alexhttps://www.blogger.com/profile/13586753411926201847noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-72577851462058783572023-01-23T05:11:04.002-08:002023-01-23T05:11:04.002-08:00Consider it public domain, credit me if you want t...Consider it public domain, credit me if you want to.<br /><br />Note however that this was one of my first (and worst) attempts at a good 64 bit mixer.<br />A better rrxmrrxmxs-mixer is here:<br />https://mostlymangling.blogspot.com/2019/01/<br /><br />A much better (and slightly slower) alternative is here:<br />https://mostlymangling.blogspot.com/2020/01/<br /><br />The best compromise speed/quality I've seen is Jon Maiga's mx3 mixer:<br />http://jonkagstrom.com/mx3/mx3_rev2.html<br /><br />The extra multiply and shift in mx3 is slightly slower than Murmur3-style mixers (3 xs, 2ms) but provides vastly superior results from a statistical standpoint. According to my measurements, it's on most platforms about as fast as NASAM.<br /><br />If your main platform is older GPUs (< GeForce RTX), NASAM is probably faster due to 64 bit multiplications being expensive.<br />On more recent GPUs and modern 64 bit CPUs, mx3 is faster.Pelle Evensenhttps://www.blogger.com/profile/03828317178444752816noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-21089397403819082822023-01-22T14:38:34.307-08:002023-01-22T14:38:34.307-08:00Hello, you're doing a great job and I enjoyed ...Hello, you're doing a great job and I enjoyed reading your block. What license do you publish your rmxmx code under? Can I use it in my projects?Alexhttps://www.blogger.com/profile/13586753411926201847noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-69165215654523582062019-07-04T07:11:45.600-07:002019-07-04T07:11:45.600-07:00Hey Pelle, interesting work. Do you also have vari...Hey Pelle, interesting work. Do you also have variants for 32-bit mixing?orlphttps://www.blogger.com/profile/18375542373525229077noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-51227767819571070942019-07-04T07:10:00.130-07:002019-07-04T07:10:00.130-07:00Den här kommentaren har tagits bort av skribenten.orlphttps://www.blogger.com/profile/18375542373525229077noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-9464691032478182262019-01-25T01:06:35.380-08:002019-01-25T01:06:35.380-08:00If you have the time, please test it with -tf 2 to...If you have the time, please test it with -tf 2 too. That is the "standard level" I have been using when testing MurmurHash3, Variant 13 and rrmxmx (as well as my latest and greatest).<br /><br />Sorry about the sloppiness, will rerun the test with *left*-rotation and post the results.Pelle Evensenhttps://www.blogger.com/profile/03828317178444752816noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-79481745074910355052019-01-24T23:33:31.493-08:002019-01-24T23:33:31.493-08:00I used the command line:
$ RNG_test.exe Ettinger -...I used the command line:<br />$ RNG_test.exe Ettinger -multithreaded -seed 0xa6948409<br /><br />But my PractRand code has been modified (not the tests, just adding random number generators). I added this code to mult.c in the "not recommended" section of the RNGs (rotate64 is defined by PractRand as a left rotation, not a ror64 like with the other code you're comparing with):<br /><br /> Uint64 Ettinger::raw64() {<br /> //// passes 32TB with one anomaly, unusual at only 2GB: [Low16/64]BCFN(2+2,13-2,T)<br /> uint64_t z = (++state ^ UINT64_C(0xDB4F0B9175AE2165)) * UINT64_C(0x4823A80B2006E21B);<br /> z = (z ^ rotate64(z, 52) ^ rotate64(z, 21) ^ UINT64_C(0x9E3779B97F4A7C15)) * UINT64_C(0x81383173);<br /> return z ^ z >> 28;<br /> }<br /> std::string Ettinger::get_name() const { return "Ettinger"; }<br /> void Ettinger::walk_state(StateWalkingObject *walker) {<br /> walker->handle(state);<br /> }<br /><br />I added this in mult.h:<br /><br /> class Ettinger : public vRNG64 {<br /> Uint64 state;<br /> public:<br /> Uint64 raw64();<br /> std::string get_name() const;<br /> void walk_state(StateWalkingObject *);<br /> };<br /><br />And it's added to the listing in RNG_from_name.h, for the section related to mult.h :<br /><br />REGISTER_RNG_0(Ettinger)<br /><br /><br />Hope that helps. It looks like this one might be strong only when the counter is 1 and nothing is rotated.Tommy Ettingerhttps://www.blogger.com/profile/04953541827437796598noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-32764373310313592222019-01-24T16:57:49.580-08:002019-01-24T16:57:49.580-08:00So there: https://mostlymangling.blogspot.com/2019...So there: https://mostlymangling.blogspot.com/2019/01/better-stronger-mixer-and-test-procedure.html<br />I'd be very interested to know what flags to PractRand you were using when it passed 32TB with a unitary counter.Pelle Evensenhttps://www.blogger.com/profile/03828317178444752816noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-54897199443109865632019-01-24T09:42:17.592-08:002019-01-24T09:42:17.592-08:00I ran a complete test, 128 instances with counters...I ran a complete test, 128 instances with counters as described above and the results were not encouraging:<br />---<br />Will test ./ettinger --increment 1 --seed 0 | RNG_test stdin64 -tf 2 -te 0 -multithreaded -tlmax 1TB -tlmin 1KB <br /> 00F:40 *01F:15* *02F:15* *03F:14* *04F:15* *05F:15* *06F:16* *07F:16*<br />*08F:15* *09F:16* *10F:15* *11F:15* *12F:15* *13F:16* *14F:16* *15F:16*<br />*16F:16* *17F:16* *18F:16* *19F:17* *20F:16* *21F:18* *22F:19* *23F:18*<br />*24F:18* *25F:17* *26F:19* *27F:17* *28F:17* *29F:18* *30F:19* *31F:20*<br />*32F:20* *33F:21* *34F:22* *35F:22* *36F:23* *37F:19* *38F:19* *39F:16*<br />*40F:17* *41F:19* *42F:19* *43F:22* *44F:24* *45F:28* *46F:33* *47F:35*<br />*48F:36* *49F:35* *50F:37* *51F:37* *52F:33* *53F:37* *54F:37* *55F:38*<br />*56F:39* 57F:40 58F:40 59F:40 60F:40 61F:40 62F:40 63F:40<br />*00R:14* *01R:15* *02R:15* *03R:16* *04R:15* *05R:16* *06R:16* *07R:16*<br />*08R:16* *09R:16* *10R:15* *11R:15* *12R:17* *13R:17* *14R:17* *15R:17*<br />*16R:18* *17R:19* *18R:18* *19R:19* *20R:20* *21R:18* *22R:17* *23R:21*<br />*24R:25* *25R:25* *26R:25* *27R:23* *28R:19* *29R:19* *30R:21* *31R:16*<br />*32R:16* *33R:16* *34R:21* *35R:19* *36R:19* *37R:19* *38R:21* *39R:22*<br />*40R:25* *41R:27* *42R:26* *43R:25* *44R:24* *45R:23* *46R:22* *47R:21*<br />*48R:20* *49R:19* *50R:18* *51R:17* *52R:17* *53R:17* *54R:17* *55R:17*<br />*56R:17* *57R:17* *58R:17* *59R:17* *60R:16* *61R:15* *62R:15* *63R:15*<br />120 failures out of 128 tests with max 2^40 bytes. <br />---<br />*XXF:AA* indicates that mix(ror64(ctr++, XX)) failed after 2^AA analyzed bytes. <br />XXR:AA indicates that mix(ror64(reverse(ctr++), XX)) passed (-tlmax at 1TB, 2^40).<br /><br />So it behaves quite well with gamma=1 but that is a rather special case. As an approximation of a random permutation, it's not particularly strong.<br /><br />I'll try to find time later tonight to post the best I've found so far, (1 failure after 2^39 bytes, at ror64(reverse(ctr), 14)).<br /><br />(Your proposal is still much better than MurmurHash3 and Variant13.)Pelle Evensenhttps://www.blogger.com/profile/03828317178444752816noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-64475310540993217072019-01-23T02:30:35.125-08:002019-01-23T02:30:35.125-08:00Something that you may want to check out, part of ...Something that you may want to check out, part of a post coming up Real Soon Now, is feeding a rotated and bit reversed unary counters. That is 128 possible counters, feeding each to the mixing function and run the output though PractRand. So far, many failures out of 128 tests (running with - tf 2 -tlmax 1TB or so) seems to be a strong indicator that the function will eventually fail for a non-rotated unary counters. The MurmurHash3 finalizer and Variant 13 fails quite spectacularly. So far I have one candidate that fails one out of 128 tests after 1TB on that particular test. It uses two (ror-ror-xor)-mul steps and one shift-xor. Not the same constants used above. I also have a candidate with no failures so far but it has only passed about 16 put of the 128 subtests. Pelle Evensenhttps://www.blogger.com/profile/03828317178444752816noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-46887425952929513872019-01-22T13:52:03.644-08:002019-01-22T13:52:03.644-08:00I found an interesting result based on this post, ...I found an interesting result based on this post, using xor-rotate-rotate but omitting one of the xorshifts. In this code, rotate64 is a left-rotation defined in a macro:<br /><br />uint64_t state = UINT64_C(0xEB5F2AE4); // any should work, this was tested<br />uint64_t random64() {<br />uint64_t z = (++state ^ UINT64_C(0xDB4F0B9175AE2165)) * UINT64_C(0x4823A80B2006E21B);<br />z = (z ^ rotate64(z, 52) ^ rotate64(z, 21) ^ UINT64_C(0x9E3779B97F4A7C15)) * UINT64_C(0x81383173);<br />return z ^ z >> 28;<br />}<br /><br />(I probably made some elementary C++ mistakes in the UINT64_C usage.) This is one of very few gamma=1 unary hashes I have tested that passes 32TB of PractRand, and is probably the fastest of the ones I have found due to omitting one of the xorshift-multiply lines. Any multiplications use an XLCG's step of a XOR with a constant that has the last three bits 101, then a a multiplication by a constant with last three bits 011 (this makes an input of 0 produce a non-zero output). The initial step acts like a multiplication by a large gamma, but the XOR preceding it means that the modular multiplicative inverse of the gamma can't be used accidentally as the stride between inputs. I don't know why this set of rotation amounts, xors, shifts, and multiplies works but most others, when substituted in, don't. Some multipliers were chosen for having a low population count, in the hope that they would be faster. The XOR constants are 2 to the 64 divided by generalizations of the golden ratio, with the last bit adjusted to be odd. I hope this hash can be useful or interesting!Tommy Ettingerhttps://www.blogger.com/profile/04953541827437796598noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-87954723811978474542018-09-14T21:58:47.738-07:002018-09-14T21:58:47.738-07:00I should have guessed that "x ^= ror(x, a) ^ ...I should have guessed that "x ^= ror(x, a) ^ ror(x, b)" implies number of bits is 64; I erroneously assumed it was generic. My example above doesn't really apply to your calculations which are only dealing with 64 bit numbers. It looks like the function is bijective when a number of bits is not divisible by 3. I knew I was missing something, thanks for clarification.cheriohttps://www.blogger.com/profile/06900099862551850114noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-39265882290345559702018-09-14T14:31:03.552-07:002018-09-14T14:31:03.552-07:00I assume you mean 101 in binary, 5 in (hexa)decima...I assume you mean 101 in binary, 5 in (hexa)decimal.<br />First case, x = 101_2:<br />ror64(0x5, 1) == 0x8000000000000002<br />ror64(0x5, 2) == 0x4000000000000001<br /><br /> 0x0000000000000005<br /> 0x8000000000000002<br />^ 0x4000000000000001<br />--------------------<br /> 0xC000000000000008<br /><br />Second case, x = 110_2<br />ror64(0x6, 1) == 0x0000000000000003<br />ror64(0x6, 2) == 0x8000000000000001<br /><br /> 0x0000000000000003<br /> 0x8000000000000001<br />^ 0x0000000000000006<br />--------------------<br /> 0x800000000000000A != 0xC000000000000008<br /><br />That x ^ ror64(x, a) ^ ror64(x, b) is bijective might not be entirely obvious. Note that x ^ ror64(x, a) isn't, for any a.<br /><br />I checked by verifying that there exists an inverse for every {a, b} when using 64 bits. Since rotates, shifts and xor all can be expressed by multiplication with a binary matrix, I constructed the relevant matrices in Octave and checked that there exists an inverse for each matrix. <br /><br />Perhaps not so elegant but it got the job done.<br /><br />Coming up is a post for word sizes 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 24, 30, 36, 40, 48, 56 & 64. Plus variants. I guess I should put the constants up on github, for the cases where not all pairs are bijective.<br /><br />One could of course also mix rotations and shifts or have more than 2 rotations/shifts for a single xor.<br />I did compute all invertible<br />pairs {a, b} for <br />f_2(x, a, b) { return x ^ (x >> a) ^ (x << b); }<br /><br />triples {a, b, c} for<br />f_3(x, a, b) { return x ^ (x >> a) ^ (x >> c) ^ (x << b); }<br /><br />and quadruples {a, b, c, d} for<br />f_3(x, a, b) { return x ^ (x >> a) ^ (x >> c) ^ (x << b) ^ (x << d); }<br />a for 32- and 64-bits a while ago.<br /><br />You might want to read this new post: http://mostlymangling.blogspot.com/2018/09/invertible-additions-mod-2.html<br />Pelle Evensenhttps://www.blogger.com/profile/03828317178444752816noreply@blogger.comtag:blogger.com,1999:blog-3973020588456062013.post-71106053134704562872018-09-14T10:40:13.089-07:002018-09-14T10:40:13.089-07:00This is a great writeup. I am having a terrible fe...This is a great writeup. I am having a terrible feeling the question I am about to ask may sound ridiculously stupid. Why is function "y = x ^ ror(x, a) ^ ror(x, b)" bijective? A simple case when x = 101, a = 1, b = 2 results in y = 000; similarly when x = 110 or x = 011 the function also produces y = 000. I am almost convinced I am missing somethingcheriohttps://www.blogger.com/profile/06900099862551850114noreply@blogger.com