Comment by kevmo314

3 months ago

I first noticed this with DeepSeek R1. For some really hard questions (some not even answerable), it would come up with a line of reasoning that convinced me that it had the right answer. If I read the answer without the reasoning, it was clear it made no sense.

We might be incentivizing answers that sound right with reinforcement learning as opposed to answers that are actually right.

1 comment

kevmo314

latentsea 3 months ago

> We might be incentivizing answers that sound right with reinforcement learning as opposed to answers that are actually right.

We do this with other humans, so I don't know that we know how to avoid doing the same with machines.