Comment by stingraycharles

15 hours ago

I think the parent’s point is that this should be implemented using e.g. Bayesian statistics rather than an LLM, as the judge LLM is vulnerable to the exact same types of attacks that it’s trying to protect against.

Most proper LLM guardrails products use both.