← Back to context

Comment by mootothemax

20 hours ago

> We had good small language models for decades. (E.g. BERT)

BERT isn’t a SLM, and the original was released in 2018.

The whole new era kicked off with Attention Is All You Need; we haven’t reached even a single decade of work on it.

> BERT isn’t a SLM

Huh? BERT is literally a language model that's small and uses attention.

And we had good language models before BERT too.

They were a royal bitch to train properly, though. Nowadays you can get the same with just 30 minutes of prompt engineering.

  • > > BERT isn’t a SLM Huh? BERT is literally a language model that's small and uses attention.

    Astute readers will note what’s been missed here.

    Fascinating, really. Your confidently-statement yet factually void comments I’d have previously put down to one of the classic programmer mindsets. Nowadays though - where do I see that kind of thing most often? Curious.

    • After some research, I think I understand what you're getting at here - BERT being a model for encoding text but not architecturally feasible to generate text with it, which "LLMs" (the lack of definition here is resulting in you two talking past eachother), maybe more accurately referred to as GPTs, can do.

      Also the irony of your comment when it in itself was confidently stated yet void of any content was not missed either - consider dropping the superiority complex next time.

      3 replies →

    • > Astute readers will note what’s been missed here.

      I’m not astute enough to see what was missed here. Could you explain?