Comment by theptip
14 hours ago
I agree that providers are in some sense incentivized to juice the numbers here. But, they are in an incredibly competitive 3-way knife fight, and so they are also heavily incentivized to be honest with themselves about quality gaps.
I think I better understand your point now. I was mostly arguing for this as an internal metric inside the model user’s company, I agree it’s less useful coming directly from Anthropic’s measurements.
What I meant by “agent / skill infra layer” is if you’re a big company and trying to write skills that are widely shared, build common tooling for thousands of engineers to use agents within a big repo, etc.
RE “creepy”, I dunno, this case doesn’t bother me, but I can see why it might. It’s definitely being done though.
No comments yet
Contribute on Hacker News ↗