Comment by csomar

9 days ago

Unless LLMs architecture have changed, that is exactly what they are doing. You might need to learn more how LLMs work.

Unless the LLM is a base model or just a finetuned base model, it definitely doesn't predict words just based on how likely they are in similar sentences it was trained on. Reinforcement learning is a thing and all models nowadays are extensively trained with it.

If anything, they predict words based on a heuristic ensemble of what word is most likely to come next in similar sentences and what word is most likely to give a final higher reward.

  • > If anything, they predict words based on a heuristic ensemble of what word is most likely to come next in similar sentences and what word is most likely to give a final higher reward.

    So... "finding the most likely next word based on what they've seen on the internet"?

    • Reinforcement learning is not done with random data found on the internet; it's done with curated high-quality labeled datasets. Although there have been approaches that try to apply reinforcement learning to pre-training[1] (to learn in an unsupervised way a predict-the-next-sentence objective), as far as I know it doesn't scale.

      [1] https://arxiv.org/pdf/2509.19249

  • You know that when A. Karpathy released NanoLLM (or however it was called), he said it was mainly coded by hand as the LLMs were not helpful because "the training dataset was way off". So yeah, your argumentation actually "reinforces" my point.

    • No, your opinion is wrong because the reason some models don't seem to have some "strong opinion" on anything is not related to predicting words based on how similar they are to other sentences in the training data. It's most likely related to how the model was trained with reinforcement learning, and most specifically, to recent efforts by OpenAI to reduce hallucination rates by penalizing guessing under uncertainty[1].

      [1] https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4a...

      5 replies →