Comment by Amekedl

1 year ago

Ever heard about benchmark contamination?

Ever tried to explain a new concept, like a new state management store for web frontend?

Most fail spectacularly there, sonnet 3.7 I had reasonable ""success"" with, but not 4.5. It faltered completely.

Let’s not get ahead of ourselves. Looking at training efficiency in this now, and all the other factors, it really is difficult to paint a favorable picture atm.

2 comments

Amekedl

killerstorm 1 year ago

You sound like Gary Marcus.

Amekedl 1 year ago

Didn't know him, but he seems overly skeptical. Honestly, I was just expecting more from llama-4 than this, hence mentioning the wall. I hope it's still too early to tell, because new ideas are going to change stuff inevitably, maybe anthropic opens up more, or chinese labs keep overdelivering...