Comment by Rexxar

12 hours ago

  > Someone didn't get the memo that for LLMs, tokens are units of thinking.

Where do you get this memo ? Seems completely wrong to me. More computation does not translate to more "thinking" if you compute the wrong things (ie things that contribute significantly to the final sentence meaning).

7 comments

Rexxar

staminade 12 hours ago

That’s why you need filler words that contribute little to the sentence meaning but give it a chance to compute/think. This is part of why humans do the same when speaking.

dTal 7 hours ago

The LLM has no accessible state beyond its own output tokens; each pass generates a single token and does not otherwise communicate with subsequent passes. Therefore all information calculated in a pass must be encoded into the entropy of the output token. If the only output of a thinking pass is a dumb filler word with hardly any entropy, then all the thinking for that filler word is forgotten and cannot be reconstructed.
jaccola 12 hours ago
Do you have any evidence at all of this? I know how LLMs are trained and this makes no sense to me. Otherwise you'd just put filler words in every input
e.g. instead of: "The square root of 256 is" you'd enter "errr The er square um root errr of 256 errr is" and it would miraculously get better? The model can't differentiate between words you entered and words it generated its self...
- muzani 11 hours ago
  
  It's why it starts with "You're absolutely right!" It's not to flatter the user. It's a cheap way to guide the response in a space where it's utilizing the correction.
- mike_hearn 7 hours ago
  
  People have researched pause tokens for this exact reason.
- staminade 11 hours ago
  
  What do you think chain of thought reasoning is doing exactly?
- lijok 11 hours ago
  
  You’re conflating training and inference