Comment by ArielTM

10 hours ago

The debate here is missing a practical question: is the judge from the same model family as the agent it's judging?

If both are Claude, you have shared-vulnerability risk. Prompt-injection patterns that work against one often work against the other. Basic defense in depth says they should at least be different providers, ideally different architectures.

Secondary issue: the judge only sees what's in the HTTP body. Someone who can shape the request (via agent input) can shape the judge's context window too. That's a different failure mode than "judge gets tricked by clever prompting." It's "judge is starved of the signals it would need to spot the trick."

0 comments

ArielTM

No comments yet

Contribute on Hacker News ↗