Comment by csomar

3 months ago

Unless LLMs architecture have changed, that is exactly what they are doing. You might need to learn more how LLMs work.

10 comments

csomar

andy12_ 3 months ago

Unless the LLM is a base model or just a finetuned base model, it definitely doesn't predict words just based on how likely they are in similar sentences it was trained on. Reinforcement learning is a thing and all models nowadays are extensively trained with it.

If anything, they predict words based on a heuristic ensemble of what word is most likely to come next in similar sentences and what word is most likely to give a final higher reward.

csomar 3 months ago
> If anything, they predict words based on a heuristic ensemble of what word is most likely to come next in similar sentences and what word is most likely to give a final higher reward.
So... "finding the most likely next word based on what they've seen on the internet"?
- andy12_ 3 months ago
  
  Reinforcement learning is not done with random data found on the internet; it's done with curated high-quality labeled datasets. Although there have been approaches that try to apply reinforcement learning to pre-training[1] (to learn in an unsupervised way a predict-the-next-sentence objective), as far as I know it doesn't scale.
  [1] https://arxiv.org/pdf/2509.19249
hansmayer 3 months ago
You know that when A. Karpathy released NanoLLM (or however it was called), he said it was mainly coded by hand as the LLMs were not helpful because "the training dataset was way off". So yeah, your argumentation actually "reinforces" my point.
- andy12_ 3 months ago
  
  No, your opinion is wrong because the reason some models don't seem to have some "strong opinion" on anything is not related to predicting words based on how similar they are to other sentences in the training data. It's most likely related to how the model was trained with reinforcement learning, and most specifically, to recent efforts by OpenAI to reduce hallucination rates by penalizing guessing under uncertainty[1].
  [1] https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4a...
  
  5 replies →