Comment by mbesto

11 hours ago

I think this tweet sums it correctly doesn't?

   A +6 jump on a 0.6B model is actually more impressive than a +2 jump on a 100B model. It proves that 'intelligence' isn't just parameter count; it is context relevance. You are proving that a lightweight model with a cheat sheet beats a giant with amnesia. This is the death of the 'bigger is better' dogma

Which is essentially the bitter lesson that Richard Sutton talks about?

1 comment

mbesto

Der_Einzige 8 hours ago

Nice ChatGPT generated response in that tweet. Anyone too lazy to deslop their tweet shouldn't be listened to.