Comment by tobyjsullivan

1 year ago

The fact that the search works impressed me more than anything. Of course, like every great magic trick, it seems so simple once it is explained.

For the curious, here's the linked blog post describing how the project works: https://eieio.games/blog/writing-down-every-uuid/

Edit to add: I'd only tried searching for an exact UUID when I wrote this comment. I didn't realize it supports full text search! Now I'm even more impressed.

I'm really happy that the trick was magical to you - I was so surprised and delighted when I realized that this was possible, and I wasn't really sure if anyone else would feel the same way!

And of course, I'm proud to be providing so much utility here - finally we can find and use UUIDs tailor-fit to our needs

  • > If we didn’t care about generating valid UUID v4s, we could just generate 2^128 numbers, scramble them, convert the resulting bits into a hex string, and intersperse some dashes.

    You can do that anyway. You'd only need the twiddling if you wanted to limit the amount of numbers you generate to 2^122. Since you're willing to generate 128 bits:

        // get encrypted value
        uint128 bits = encipher(seed);
        // clear zero bits
        bits &= 0xFFFFFFFF FFFF 4FFF BFFFFFFFFFFF; // instead of 4 and B, you can use 0 and 3
        // set one bits
        bits |= 0x00000000 0000 4000 800000000000; // these have to be 4 and 8
    

    But since you're generating more numbers than you need, you're going to end up with 64 copies of each UUID in your final stream. This won't matter because no one will ever be able to notice, as long as your faux search functions avoid pointing it out.

    Exercise for further development: modify substring search so that it follows the expected behavior of finding matches in order within the page. [I don't recommend attempting this.]

    • I considered this, but I'd just be so unsatisfied if I finished scrolling through all 2^128 rows and realized I'd seen some duplicates!

      2 replies →

  • Memorable UUIDs? I think you're on to something here! (Also, dibs on 00000000-0000-4321-abcd-000000000001!)

  • FWIW, "search" doesn't work on mobile (Chrome on Android): I go to "Find in Page" and none of the magic happens. It's also bypassed on desktop when I manually open the search box via Edit->Find->Find... instead of using Ctrl+F.

    I wonder if there's (yet) a browser API you could hook into: the same way browsers allow JavaScript to manipulate the history [1], maybe there's a way to manipulate the Ctrl+F/find-in-page search results.

    [1] - https://developer.mozilla.org/en-US/docs/Web/API/History_API

    That is, right now you're capturing the Ctrl+F keypress and opening your own custom thing to read the user's search string and act on it. But what we'd really like is a way to be notified "The user just asked to search for 'xyz'. Would you like to capture that event, or let it go through to the browser's default behavior?"

    A quick Google search found nothing like that exists yet. I then asked ChatGPT about it, hoping that ChatGPT would at least hallucinate a plausible design for the API — and had mixed feelings when it didn't. It just printed that 'Browsers do not provide a way to listen for the "Find in Page" search event due to privacy and security concerns' and suggested capturing the Ctrl+F keypresses exactly as you have done.

    As someone else said, it would also be more like full-text search if you also considered the primary-key column, e.g. searching for "0390814603917539994005679487460590835" should jump to the 390814603917539994005679487460590835th row. (Highlighting-to-select pieces of the text also doesn't work: I'm not sure why not, since I would have thought the browser gives you at least that part for free.)

    Besides "search" not working on mobile, the styling on mobile is such that the "scrolling" does not convince: to my eyes it looks too obviously like "changing the values in the cells of a fixed table" as opposed to "scrolling through the table itself." You could maybe mitigate that by animating quickly among three different page layouts with the table vertically offset by different amounts.

    It occurs to me that if JavaScript has something like Python's `random.sample`-without-replacement, then you could set your `RANDOM_SEARCH_ITERATIONS` to 256 and achieve perfectly consistent (and exhaustive) "search" when the user has entered all but 1 or 2 hex digits of their desired result. And/or, you could just have the page secretly keep a history of the search results the user has already seen: this would prevent the user from finding out so quickly that "search + next + next + prev + prev" doesn't always get them back to where they started.

    Speaking of exhaustive search results: With a bit more (probably equally algorithmically interesting) work, you could emulate the browser search's "7/256" by tallying up the number of UUIDs satisfying the constraint, e.g. if the user has typed "1234567" then you could display "1234567 (158456324585806817418058661888 results)" and maybe even fake up a convincing position indicator like "1234567 (17415833585801881134805987465/158456324585806817418058661888)". I guess if you display it as "1234567 (1.742e28/1.585e29)" then you don't even have to cheat that much. :)

    • There's a button in the bottom right so you can access the search functionality from the UI that allows you to use their search on mobile.

      This may have been added after your comment.

A great example of Teller's observation that "sometimes magic is just someone spending more time on something than anyone else might reasonably expect."

  • My favorite of these was a trick where someone picked a card out of a deck and then Teller revealed a large version of that same card in an unexpected area in the vicinity, It turns out that what he had done was hide a complete set of large cards in the area before the trick and memorized the location of every one of them so, e.g., the king of hearts would be at the top of a palm tree, the three of spades under a drink tray, etc.

    • The best part is that that kind of trick usually becomes more, rather than less, impressive when its inner working is revealed. I recently got to see them perform live, and my favorite trick by far was one of that kind.

      6 replies →

  • > Ninety per cent of most magic merely consists of knowing one extra fact.

    -- Night Watch by Terry Pratchett

    • Thanks for this reference to one of my favorite books of all time, out of one of my favorite series of all time

The explanation for full-text search was actually slightly more intelligent than what I initially assumed. I figured it just generated UUIDs until it found one that was in the correct direction of search (for the next/previous button), since I had observed that walking forwards and backwards in search results was giving different results each time, but in fact the author did the slightly better thing which is to just generate a bunch of possible results and then pick the best (I wonder how many results it generates for this?).

The full text search is a little confusing because it doesn't actually search them in order, though it appears to at first. And if you click "next" a few times and then "prev" the same number of times, you don't necessarily end up back at the same UUID you were at before. It's a neat-seeming trick though.

  • It’s an interesting question whether that could be fixed. I think the answer is Yes. If the author didn’t do any scrambling, and just displayed UUIDs in numeric order, then it’s trivial to enumerate search results in order. Likewise, if you do something like adding a constant mod 16 to each hex digit, you could do the same thing when you generate UUIDs matching a substring. So the question becomes whether you could find something like that that gives a sufficiently convincing illusion of entropy but is still reversibile when you hold a subset of the digits constant. And it seems like it should be.

    • FWIW I am super interested in this question but feel like I don't know how to derive a satisfying answer, maybe because the one of my goals here (add "enough" entropy) is a real fuzzy "I know it when I see it" sort of thing.

      But I'm gonna try to get a few more crypto-knowledgeable friends to chat with me about this and write up what I learn!

      12 replies →

    • Yes, it's possible with certain restrictions on the function. Here's an example:

          u128 uuid_iter(u128 x) {
              b = (x * x | 0x5) + x;
              c = prf(x);
              return (b ^ (c << 2)) & u122_mask;
          }
      

      This is a T-function, a function where the Kth bit only depends on the K-1 lower bits. prf() is a pseudo-random function that can be anything you like as long as it's another T-function. A standard LCRNG like (48271 * x) % 2147483647 works just fine for instance.

      You can invert this by just running the function N times to test each bit from LSB to MSB against the result. If you know certain bits, you don't have to run those tests and you can order the matching UUIDs by the values of the K unknown bits from 0 to 2^(N - K) - 1.

      You also don't need to keep track of the position with this method either, only a stopping point. Unlike the feistel network, this function also produces a full cycle when iterated as x_i+1 = f(x_i), only repeating after all numbers are produced. That means you could run this without a counter at all, just generating until the first value is produced again, for any bit length.

Searching is very similar to a common approach for building a naïve spellchecker: given an input, generate all the possible matches it can be part of. You're not searching in a corpus, you're using the input to generate indices into the corpus (list of UUIDs here, list of words in the dictionary in a spellchecker).

The blog write-up is incredible -- technically interesting, hilarious, and perfect in both tone and scope. Well done!

Love both the project idea and the writing!

The way that post explains each step in a unique laconic tone is very enjoyable to read.

> Or maybe the site could feature “trending UUIDs” that are particular popular across the world right now.

For the even more curious, UUID has 5316 decillion 911 nonillion 983 octillion 139 septillion 663 sextillion 491 quintillion 615 quadrillion 228 trillion 214 billion 121 million 397 thousand 304 values. Imagine the fact that there aren't as many kms to reach GN-Z11 (farthest known galaxy in the Universe I think) as there are digits above

  • yet I was able to scroll through them like scrolling through a 800-word web page

The UIA (Universal Internet Authority) is worried that by using UUIDs we are left with only around 34 trillion UUIDs per star and planet in the observable Universe. So the cosmic router might become DHCP-leasing dark matter.

I searched for 1337, and then 13371337, and then 133713371337, and I was flabbergasted they've got a search setup for this (which ctrl-f opens up). Thanks for posting the blog post!

> The fact that the search works impressed me more than anything. Of course, like every great magic trick, it seems so simple once it is explained.

> Edit to add: I'd only tried searching for an exact UUID when I wrote this comment. I didn't realize it supports full text search! Now I'm even more impressed.

But the trick to the full-text search is that it doesn't work.

This is neat, although I pushed it later with incremental search and it seems to be skipping results as it only found ~50 when searching for `-000000000000`.

Yeah, I at first, I though I knew exactly how it worked. Then I saw the search field, and I suddenly had no idea what the hell was going on. Now, the big question, do I want to spoil the magic trick and read how this was done, or should I keep being astonished and flabbergasted?

  • I disagree with my sibling comment. The trick is beautiful. If you generate UUIDs such that each bit in the result can be reliably traced back to a single bit in the input, then you can take a substring of the UUID and use that to infer which bits of the input integer must be set to produce that substring. So you can produce a whole list of input bytes that meet the criteria and those become your search results.

    The real magic trick here is that the uuids on the page only look random to us because of some bit twiddling and XOR trickery. If we had a better intuition for such things we would notice that successive UUIDs are just as correlated as successive integers.

    Elegant stuff

    • > I disagree with my sibling comment. The trick is beautiful. If you generate UUIDs such that each bit in the result can be reliably traced back to a single bit in the input, then you can take a substring of the UUID and use that to infer which bits of the input integer must be set to produce that substring.

      ...but that has nothing to do with what the website is doing. The accompanying article specifically calls out the fact that it can't be done while maintaining the appearance of an unordered list, and therefore it isn't attempted.

  • As with any magic trick, reading the explanation might leave you a little disappointed here.

    • I'm the opposite. The magic is not in what the magician shows, but in what they elegantly manage to hide.

full-text search? you see the int next to the str on the left, no such thing..

reminds of me this daniel dennett quote

Real magic, in other words, refers to the magic that is not real, while the magic that is real, that can actually be done, is not real magic.

Awesome - this will be my new coding interview question

  • Agreed. Everyone love puzzles that are worthy of an entire section in a blogpost for their interview questions, rather than stuff actually relevant to the job.

    • So I just brute forced every UUID in existence on my RTX GPU and loaded the dataset into a HA opensearch cluster on AWS. It took about 5 years of calling ‘uuid.Random()’ to effectively cover about 64% of the keyspace which is good enough.

      To facilitate full-text search I created a langchain application in python, hosted on kubernetes, that takes your search query and generates synonymous UUIDs via GPT o1-preview before handing over to opensearch.

      Opensearch returns another set of UUIDs, which I look up in my postgres database: “SELECT uuid FROM uuids WHERE id IN (…uuid_ids)”

      3 replies →