Comment by simonw
2 years ago
If you want to play with a demo of this kind of thing I suggest this Colab notebook: https://linus.zone/contra-colab - via https://twitter.com/thesephist/status/1711597804974739530
It demonstrates a neat model that's specifically designed to let you embed text, manipulate the embeddings (combine them, average them or whatever) and then turn them back into text again. It's fascinating.
vec2text has a pretty nice demo notebook too! You just need to provide your OpenAI API key, since we use their API to get embeddings. Here's a link: https://colab.research.google.com/drive/14RQFRF2It2Kb8gG3_YD... and a link to vec2text on Github: https://github.com/jxmorris12/vec2text.
This is an awesome notebook.
Just want to note that the difference in this paper is that it works without direct access to the embedding models (encoder). So it can't design the embedding space.
Thanks for sharing Simon! I will note that by training an adapter layer between this autoencoder's embedding space and OpenAI's, it's possible to recover a significant amount of detail from text-embedding-ada-002's embeddings with this model too[0]. But as the paper author's reply in a different thread points out, their iterative refinement approach is able to recover much more detail in their research with a smaller model.
[0] https://twitter.com/thesephist/status/1698095739899974031