Comment by andai
5 days ago
I've been testing some models that score higher than Opus 4.6.
They:
- hallucinate constantly
- can't follow basic instructions
- think they're Claude for some reason ;)
5 days ago
I've been testing some models that score higher than Opus 4.6.
They:
- hallucinate constantly
- can't follow basic instructions
- think they're Claude for some reason ;)
The only one I see that thinks it is claude other than claude itself is the GLM series.
I have screenshots of Deepseek V4 doing this too - in a non-Claude-Code harness.
Also MiMo...