Comment by hoc

10 days ago

That also is similar in a sense to a typical human bahavior of "rounding" a "logical" argument, and then building the next ones on top of that, rounding at each or at least many steps in succession and bacically ending up at arbitrary (or intended) conclusions.

This is hard to correct with a global training, as you would need to correct each step, even the most basic ones, instead. As it's hard to convince someone that their result is not correct, when you actually would have to show the errors in the steps that led there.

For LLMs it feels even more tricky when thinking about complex paths being encoded somehow dynamically in simple steps than if there was some clearer/deeper path that could be activated and corrected. Correcting one complex "truth" seems much more straightforward (sic) than effectively targeting those basic assumptions enough so that they won't build up to something strange again.

I wonder what effective ways exist to correct these reasoning models. Like activating the full context and then retraining the faulty steps, or even "overcorrecting" the most basic ones?