Comment by WhitneyLand
2 years ago
It’s definitely not obvious no matter how smart you are! The common metaphor used is it’s like a conversation.
Imagine you read one comment in some forum, posted in a long conversation thread. It wouldn’t be obvious what’s going on unless you read more of the thread right?
A single paper is like a single comment, in a thread that goes on for years and years.
For example, why don’t papers explain what tokens/vectors/embedding layers are? Well, they did already, except that comment in the thread came 2013 with the word2vec paper!
You might think wth? To keep up with this some one would have to spend a huge part of their time just reading papers. So yeah that’s kind of what researchers do.
The alternative is to try to find where people have distilled down the important information or summarized it. That’s where books/blogs/youtube etc come in.
Is there a way of finding interesting "chains" of such papers, short of scanning the references / "cited by" page?
(For example, Google Scholar lists 98797 citations for Attention is all you need!)
As a prerequisite to the attention paper? One to check out is:
A Survey on Contextual Embeddings https://arxiv.org/abs/2003.07278
Embeddings are sort of what all this stuff is built on so it should help demystify the newer papers (it’s actually newer than the attention paper but a better overview than starting with the older word2vec paper).
Then after the attention paper an important one is:
Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165
I’m intentionally trying to not give a big list because they’re so time-consuming. I’m sure you’ll quickly branch out based on your interests.