Comment by bigyabai
8 days ago
If your definition of "competitive" is loose enough, you can write your own Markov chain in an evening. Transformer models rely on a lot of prior art that has to be learned incrementally.
8 days ago
If your definition of "competitive" is loose enough, you can write your own Markov chain in an evening. Transformer models rely on a lot of prior art that has to be learned incrementally.
Not that loose lol.
I’m thinking it’s still llama / dense decoder only transformer.