Comment by ArielTM
10 hours ago
The debate here is missing a practical question: is the judge from the same model family as the agent it's judging?
If both are Claude, you have shared-vulnerability risk. Prompt-injection patterns that work against one often work against the other. Basic defense in depth says they should at least be different providers, ideally different architectures.
Secondary issue: the judge only sees what's in the HTTP body. Someone who can shape the request (via agent input) can shape the judge's context window too. That's a different failure mode than "judge gets tricked by clever prompting." It's "judge is starved of the signals it would need to spot the trick."
No comments yet
Contribute on Hacker News ↗