Comment by latentsea

1 year ago

> We might be incentivizing answers that sound right with reinforcement learning as opposed to answers that are actually right.

We do this with other humans, so I don't know that we know how to avoid doing the same with machines.