← Back to context

Comment by staticshock

16 hours ago

You're absolutely right!

You're looking at this exactly the right way.

  • do LLMs arrive at these replies organically? Is it baked into the corpus and naturally emerges? Or are these artifacts of the internal prompting of these companies?

    • Reinforcement learning.

      People like being told they are right, and when a response contains that formulation, on average, given the choice, people will pick it more often than a response that doesn't, and the LLM will adapt.

Now that you mention it, a funny expression considering the supposed emphasis they have on honesty as a guiding principle.