Comment by merlindru

17 hours ago

it may be the agent features in my case. now that i think about it, i also forgot that my CLAUDE.md is different from my AGENTS.md

either way, all that one can really rely on is the benchmarks, and those are easily cheated/overfitted to.

i think it's all very hard to quantify, so take my previous comment with a massive rock of salt