Comment by cyanmagenta
1 year ago
It’s an interesting question whether that could be fixed. I think the answer is Yes. If the author didn’t do any scrambling, and just displayed UUIDs in numeric order, then it’s trivial to enumerate search results in order. Likewise, if you do something like adding a constant mod 16 to each hex digit, you could do the same thing when you generate UUIDs matching a substring. So the question becomes whether you could find something like that that gives a sufficiently convincing illusion of entropy but is still reversibile when you hold a subset of the digits constant. And it seems like it should be.
FWIW I am super interested in this question but feel like I don't know how to derive a satisfying answer, maybe because the one of my goals here (add "enough" entropy) is a real fuzzy "I know it when I see it" sort of thing.
But I'm gonna try to get a few more crypto-knowledgeable friends to chat with me about this and write up what I learn!
You've sniped me and I'm going to try and tackle this over the weekend. What's the best way to exchange things for you? Also, have you explored modular multiplicative inverses at all?
eieiogames@gmail.com is great (or whatever's best for you on https://eieio.games/whats-my-deal)
I think the biggest rabbit hole I went down was trying to use FEAL (which is very breakable) and exploiting the things that make it breakable. But this is very much not my area of expertise - I was learning as I was working - so it's very possible either that that was a dumb idea or that it was a great idea that I didn't figure out.
I also considered things like "focus on add lots of entropy to groups of 200 or so UUIDs, but have obvious patterns beyond that" which I think would be a reasonable strategy; here I just kind of ran out of time (I told myself I'd make this in a week)
Did some light reading about a few other things but nothing substantial
My first thought was to use linear transformations over Z_2 as a field, as that would create a natural interpretation of fixing certain bits as taking a linear subspace. Interestingly this leads to the property that XOR is preserved.
I implemented this in a very quick and hacky way for 32 bits. I generated a random boolean matrix M invertible in Z_2. To turn an input number x into a corresponding number y in an N-bit space, I convert x to binary and turn that into a vector of 1s and 0s, then multiply it by that randomized matrix to get y. Here are the first few y's corresponding to x=0, 1, ...:
00000000000000000000000000000000
0xx0x000xx00x0xx000xxx00xxx0x000
x0xx000xxxx0xxx00x000xxx0xx0xx0x
xx0xx00x00x00x0x0x0xx0xxx0000x0x
x0xx0x00xxx0xx0x00xx0x0xx0xx00x0
xx0xxx0000x00xx000x0x00x0x0xx0x0
00000x0x000000xx0xxx00x0xx0xxxxx
0xx0xx0xxx00x0000xx0xxx000xx0xxx
...
(Hoping the non-monospace font doesn't ruin the alignment too much.)
Which looks... random-ish? I expect that turning these into UUIDs may result in more random-looking sequences.
Not sure how to solve the problem of search with this, but the hope would be that the linear structure gives you what you need, since fixing bytes on UUIDs should correspond to considering linear subspaces of the y vectors. Perhaps this can also be used to apply lexicographic order on the corresponding x vectors (i.e.: ordering the indexes), so that you could jump to each UUID matching the search in order.
I did some fooling around with this and it seems promising.
The first problem is that, when your enciphering function is a matrix, f(0) = 0. This looks bad, but we can easily solve that problem by starting the webpage sequence at an index higher than 0.
I tried to work through a much smaller version of the problem* by hand, and it looks like this:
We have our enciphering matrix N:
and our deciphering matrix D, the inverse of N:
We want to find the next index whose encipherment ends in -110. This sets up a system of equations Dx = y, where x_3 = 1, x_4 = 1, x_5 = 0, and y tells us the index of x. By multiplying that out, we get:
So we can freely choose any values for y_1 and y_5, and the rest will be filled in by constraints.
Assuming we want the least possible value for y, this means we will pick y_1 = 0 and y_5 = 0, which then tells us that we want index [0 1 0 0 0], and we can jump to there. If we wanted the least possible value for y above a threshold (such as the current viewport), we'd pick y_n values accordingly.
Instinct tells me that libraries should exist for quickly solving systems of linear equations like this.
(For full full-text search, we'd need to do this several times, also finding solutions for the enciphered values 110xx, and x110x. This multiplies the work we need to do and the storage we need to use by an amount that is linear in the difference in length between the search string and the full UUID, which is still a lot better than trial-and-error.)
* I ended up doing it in 5 bits because every random 4x4 matrix I generated was noninvertible.
3 replies →
Hi! I do not know enough of the relevant math to appreciate what you're doing here (yet); I am going to try to do some reading to understand this better, but if you have any advice on where I should start I will gladly take it.
1 reply →
Would it be easier if only the lowest bits of the index contributed to the "entropy"? Like if the 2^16th UUID was the one right after the first and n + 2^16 was the one after the nth. You wouldn't notice the pattern but it'd be easier for the computer to handle. I guess if you were searching for a substring you might notice the ones surrounding the matches look almost the same from match to match...
Let me cook something up for you. It's an interesting puzzle.
I think you can get pretty far, if you compromise on your entropy: it only has to look random, not actually be random. (I mean it doesn't have to be cryptographically secure randomness.)
Yes, it's possible with certain restrictions on the function. Here's an example:
This is a T-function, a function where the Kth bit only depends on the K-1 lower bits. prf() is a pseudo-random function that can be anything you like as long as it's another T-function. A standard LCRNG like (48271 * x) % 2147483647 works just fine for instance.
You can invert this by just running the function N times to test each bit from LSB to MSB against the result. If you know certain bits, you don't have to run those tests and you can order the matching UUIDs by the values of the K unknown bits from 0 to 2^(N - K) - 1.
You also don't need to keep track of the position with this method either, only a stopping point. Unlike the feistel network, this function also produces a full cycle when iterated as x_i+1 = f(x_i), only repeating after all numbers are produced. That means you could run this without a counter at all, just generating until the first value is produced again, for any bit length.