Comment by estearum

10 hours ago

Can't you know that tokens are units of thinking just by... like... thinking about how models work?

7 comments

estearum

gchamonlive 9 hours ago

Can't you just know that the earth is the center of the world by... like... just looking at how the world works?

estearum 9 hours ago
Actually you'd trivially disprove that claim if you're starting from mechanistic knowledge of how orbits work, like how we have mechanistic knowledge of how LLMs work.
- gchamonlive 9 hours ago
  
  You have empirical observations, like replicating a fixed set of inner layers to make it think longer, or that you seem to have encode and decode layers. But exactly why those layers are the way they are, how they come together for emergent behaviour... Do we have mechanistic knowledge of that?
  
  2 replies →

xpe 9 hours ago

> Can't you know that tokens are units of thinking just by... like... thinking about how models work?

Seems reasonable, but this doesn't settle probably-empirical questions like: (a) to what degree is 'more' better?; (b) how important are filler words? (c) how important are words that signal connection, causality, influence, reasoning?

estearum 9 hours ago

Right, there's probably something more subtle like "semantic density within tokens is how models think"
So it's probably true that the "Great question!---" type preambles are not helpful, but that there's definitely a lower bound on exactly how primitive of a caveman language we're pushing toward.