Comment by jltsiren

17 hours ago

If the next index is stored in the target array and the indexes are random, you will likely get a cycle of length O(sqrt(n)), which can be cached.

You can avoid this with two arrays. One contains random query positions, and the target array is also filled with random values. The next index is then a function of the next query position and the previous value read from the target array.

You could sample from a different random distribution.

Eg start with every element referencing the next element (i.e. i+1 with wrap-around), and then use a random shuffle. That way, you preserve the full cycle.

  • You can do that, but it's inefficient with larger arrays. Iterating over 10 billion elements in a random order can take up to an hour. Which is probably more than what you are willing to wait for a single case in a benchmark. On the other hand, you will probably find a cycle in a uniformly random array of 10 billion elements within 0.1 seconds, which is not enough to mitigate the noise in the measurements. So you need a way of generating unpredictable query positions for at least a few seconds without wasting too much time setting it up.

    • You could also use group theory to help you.

      Basically, pick a large prime number p as the size of your array and a number 0 < x < p. Then visit your array in the order of (i*x) modulo p.

      You can also do something with (x^i) modulo p, if your processor is smart enough to figure out your additive pattern.

      Basically, the idea is to look into the same theory they use to produce PRNG with long cycles.