← Back to context

Comment by pfdietz

7 hours ago

> There aren't any "spaces we haven't formalized"

Suppose that space of N points is partitioned into M relevant subsets, for now we assume of the same size. Then random sampling hits each of those subsets in O(M log M) time, even if we don't know what they are.

This sort of partitioning is long talked about in the testing literature, with the idea you should do it manually.

> what work is being offloaded

The need to write that program for explicitly enumerating the space.

Just to avoid potential confusion, the claim is that this is a function that generates a random permutation:

    pub fn shuffle(g: *Gen, T: type, slice: []T) void {
        if (slice.len <= 1) return;

        for (0..slice.len - 1) |i| {
            const j = g.range_inclusive(u64, i, slice.len - 1);
            std.mem.swap(T, &slice[i], &slice[j]);
        }
    }

And this is a function that enumerates all permutations, in order, exactly once:

    pub fn shuffle(g: *Gen, T: type, slice: []T) void {
        if (slice.len <= 1) return;

        for (0..slice.len - 1) |i| {
            const j = g.range_inclusive(u64, i, slice.len - 1);
            std.mem.swap(T, &slice[i], &slice[j]);
        }
    }

Yes, they are exactly the same function. What matters is Gen. If it looks like this

https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...

then you get a random permutation. If it rather looks like this

https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...

you enumerate all permutations.

What's being suggested also has the m log m partition behavior in the limit where N >> M. It might be easier to see why these are actually the same things with slightly different limits, imagine a huge N enumerated by an LFSR. We'll call our enumeration function rand() for tradition's sake. Now we're back to sampling.