← Back to context

Comment by mbesto

11 hours ago

I think this tweet sums it correctly doesn't?

   A +6 jump on a 0.6B model is actually more impressive than a +2 jump on a 100B model. It proves that 'intelligence' isn't just parameter count; it is context relevance. You are proving that a lightweight model with a cheat sheet beats a giant with amnesia. This is the death of the 'bigger is better' dogma

Which is essentially the bitter lesson that Richard Sutton talks about?

Nice ChatGPT generated response in that tweet. Anyone too lazy to deslop their tweet shouldn't be listened to.