Comment by theptip

14 hours ago

I agree that providers are in some sense incentivized to juice the numbers here. But, they are in an incredibly competitive 3-way knife fight, and so they are also heavily incentivized to be honest with themselves about quality gaps.

I think I better understand your point now. I was mostly arguing for this as an internal metric inside the model user’s company, I agree it’s less useful coming directly from Anthropic’s measurements.

What I meant by “agent / skill infra layer” is if you’re a big company and trying to write skills that are widely shared, build common tooling for thousands of engineers to use agents within a big repo, etc.

RE “creepy”, I dunno, this case doesn’t bother me, but I can see why it might. It’s definitely being done though.

0 comments

theptip

No comments yet

Contribute on Hacker News ↗