Comment by Amekedl
14 days ago
Ever heard about benchmark contamination?
Ever tried to explain a new concept, like a new state management store for web frontend?
Most fail spectacularly there, sonnet 3.7 I had reasonable ""success"" with, but not 4.5. It faltered completely.
Let’s not get ahead of ourselves. Looking at training efficiency in this now, and all the other factors, it really is difficult to paint a favorable picture atm.
You sound like Gary Marcus.
Didn't know him, but he seems overly skeptical. Honestly, I was just expecting more from llama-4 than this, hence mentioning the wall. I hope it's still too early to tell, because new ideas are going to change stuff inevitably, maybe anthropic opens up more, or chinese labs keep overdelivering...