Comment by panarky
10 hours ago
If it works to run a second LLM to check the first LLM, then why couldn't a "mixture of experts" LLM dedicate one of its experts to checking the results of the others? Or why couldn't a test-time compute "thinking" model run a separate thinking thread that verifies its own output? And if that gets you 60% of the way there, then there could be yet another thinking thread that verifies the verifier, etc.
Because if the agent and governor are trained together, the shared reward function will corrupt the governor.