Comment by joefourier
21 hours ago
You can actually generate surprisingly coherent text with minimal finetuning of BERT, by reinterpreting it as a diffusion model: https://nathan.rs/posts/roberta-diffusion/
I don’t see a useful definition of LLM that doesn’t include BERT, especially given its historical importance. 340M parameters is only “small” in the sense that a baby whale is small.
No comments yet
Contribute on Hacker News ↗