Comment by energy123

10 months ago

Even though this was a scam, it's somewhat plausible. You finetune on synthetic data with lots of common reasoning mistakes followed by self-correction. You also finetine on synthetic data without reasoning mistakes where the "reflection" says that everything is fine. The model then learns to recognize output with subtle mistakes/hallucinations due to having been trained to do that.

2 comments

energy123

baegi 10 months ago

But wouldn't the model then also learn to make reasoning mistakes in the first place, where in some cases those mistakes could have been avoided by not training the model on incorrect reasoning?

Of course if all mistakes are corrected before the final output tokens this is fine, but I could see this method introducing new errors altogether.