Comment by semanticintent

3 hours ago

The FieldWorkArena finding is the most revealing — not because it's the most sophisticated exploit, but because it's the simplest. A validator that checks "did the assistant reply?" instead of "was the reply correct?" was never a benchmark. It was a participation trophy.

The pattern underneath all of these: validation that runs after the fact on outputs the agent controlled. If the thing being measured can influence the measurement, the measurement is unreliable. That's not AI-specific — it's why compilers enforce constraints at parse time instead of trusting runtime checks.

1 comment

semanticintent

esperent 2 hours ago

> A validator that checks "did the assistant reply?" instead of "was the reply correct?" was never a benchmark. It was a participation trophy

People can't even write a two paragraph comment without ai now