Comment by jxmorris12
2 years ago
First author here! I thought it was interesting that so many people thought this was obvious after-the-fact. What we're trying to convey is that although inversion turns out to be possible, it's not trivial.
Here are the three levels of inversion model described in the paper:
(1) Training a decoder to invert embeddings gives you some similar text, but only with some overlapping words.
(2) Training an encoder-decoder with the embedding projected up to a 'sequence' for the encoder works better, and gets more words back, but almost never gets the right text back exactly, especially when sequences are long.
(3) Training this fancy architecture as a 'correction' model that recursively corrects and re-embeds text works really well. It turns out that search is necessary for this kind of thing, and can get many texts back exactly.
A fourth option would be to simply scale the inversion model up by 10x or 100x, which would give us predictably better performance. We didn't try this because it's just not that interesting.
Hey man this is going to sound really weird but when I just saw your name, it struck me as super familiar. Like I know I've seen your name on GitHub somewhere. I started going through your public Repos and I figured it out!
The first commit I ever made was on a fork of one of your projects over 5 years ago. I wasn't a software engineer then, I was just some guy learning to code in his free time.
Anyways I just wanted to say thank you. I know it may seem silly. But your code gave me the scaffolding I needed to take some important first steps towards where I am today. I'm now a senior software engineer/team lead!
If you were curious, here's the fork: https://github.com/brendenriggs/GottaSlackEmAll
I've got to say: I don't play any more, but I definitely miss those early day when Pokemon Go first came out. Good times.
Wow, that’s really wild! Yeah, I remember this — 8 years ago Pokémon go came out and I was working on some projects to reverse engineer the API. Now I’m reverse engineering neural networks. Go figure.
These rare encounters are one of the things that make me love the internet.
1 reply →
> A fourth option would be to simply scale the inversion model up by 10x or 100x, which would give us predictably better performance. We didn't try this because it's just not that interesting.
Scaling is always both important and interesting.
To train a larger inversion model, I think we'll just need 16 A100s for a month or two. We can circle back in December, once the bigger model finishes training, to see that we've gotten better reconstruction performance. Fascinating!
Great paper! I have a question on how much this can be attributed to portions of text which GPT has already seen. Like if the same approach was followed on a new sliver of Wikipedia which GPT had not seen would the results be the same?