Comment by dcl

10 months ago

I like this bit:

> Personally, when I want to get a sense of capability improvements in the future, I'm going to be looking almost exclusively at benchmarks like Claude Plays Pokemon.

Definitely interested to see how the best models from Anthropics competitors do at this.,