← Back to context

Comment by pfdietz

6 hours ago

I already explained what the benefit is. What is it with this focus on offloading work from computers to people? Let people do things more easily without thinking, even if it burns more increasingly cheap cycles.

You haven't explained what the benefit is. There aren't "spaces we haven't formalized" because of the pigeonhole principle. There are M bits. You can generate every one of those 2^M values with any max cycle permutation.

What work is being offloaded from computers to people? It's exactly the same thing with more determinism and no logarithmic overhead.

  • > There aren't any "spaces we haven't formalized"

    Suppose that space of N points is partitioned into M relevant subsets, for now we assume of the same size. Then random sampling hits each of those subsets in O(M log M) time, even if we don't know what they are.

    This sort of partitioning is long talked about in the testing literature, with the idea you should do it manually.

    > what work is being offloaded

    The need to write that program for explicitly enumerating the space.

    • Just to avoid potential confusion, the claim is that this is a function that generates a random permutation:

          pub fn shuffle(g: *Gen, T: type, slice: []T) void {
              if (slice.len <= 1) return;
      
              for (0..slice.len - 1) |i| {
                  const j = g.range_inclusive(u64, i, slice.len - 1);
                  std.mem.swap(T, &slice[i], &slice[j]);
              }
          }
      

      And this is a function that enumerates all permutations, in order, exactly once:

          pub fn shuffle(g: *Gen, T: type, slice: []T) void {
              if (slice.len <= 1) return;
      
              for (0..slice.len - 1) |i| {
                  const j = g.range_inclusive(u64, i, slice.len - 1);
                  std.mem.swap(T, &slice[i], &slice[j]);
              }
          }
      

      Yes, they are exactly the same function. What matters is Gen. If it looks like this

      https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...

      then you get a random permutation. If it rather looks like this

      https://github.com/tigerbeetle/tigerbeetle/blob/809fe06a2ffc...

      you enumerate all permutations.

    • What's being suggested also has the m log m partition behavior in the limit where N >> M. It might be easier to see why these are actually the same things with slightly different limits, imagine a huge N enumerated by an LFSR. We'll call our enumeration function rand() for tradition's sake. Now we're back to sampling.