Comment by jaggs

1 month ago

It's a great benchmark. Don't listen to the haters. This one is especially interesting.

3 comments

jaggs

This one's even more interesting

Who knew Anthropic was this far behind???

jaggs 1 month ago
Yeah, but actually that's not a good look. Anyone who's used Gemini will know how random it is in terms of getting anything serious done, compared to the rock solid opus experience.
- BoorishBears 1 month ago
  
  Their benchmark is chock-full of things like that: It's deeply flawed and is essentially rating how LLMs perform if you exert yourself trying to hold them entirely the wrong way.