Comment by apsurd

1 month ago

do LLMs arrive at these replies organically? Is it baked into the corpus and naturally emerges? Or are these artifacts of the internal prompting of these companies?

1 comment

apsurd

GuB-42 1 month ago

Reinforcement learning.

People like being told they are right, and when a response contains that formulation, on average, given the choice, people will pick it more often than a response that doesn't, and the LLM will adapt.