Comment by D-Machine

4 days ago

Even more generally than verification, just being tied to a loss function that represent something we actually care about. E.g. compiler and test errors, LEAN verification in Aristotle, basic physics energy configs in AlphaFold, or win conditions in e.g. RL, such as in AlphaGo.

RLHF is an attempt to push LLMs pre-trained with a dopey reconstruction loss toward something we actually care about: imagine if we could find a pre-training criterion that actually cared about truth and/or plausibility in the first place!

1 comment

D-Machine

alex43578 4 days ago

There's been active work in this space, including TruthRL: https://arxiv.org/html/2509.25760v1. It's absolutely not a solved problem, but reducing hallucinations is a key focus of all the labs.