← Back to context

Comment by estearum

10 hours ago

Can't you know that tokens are units of thinking just by... like... thinking about how models work?

Can't you just know that the earth is the center of the world by... like... just looking at how the world works?

  • Actually you'd trivially disprove that claim if you're starting from mechanistic knowledge of how orbits work, like how we have mechanistic knowledge of how LLMs work.

    • You have empirical observations, like replicating a fixed set of inner layers to make it think longer, or that you seem to have encode and decode layers. But exactly why those layers are the way they are, how they come together for emergent behaviour... Do we have mechanistic knowledge of that?

      2 replies →

> Can't you know that tokens are units of thinking just by... like... thinking about how models work?

Seems reasonable, but this doesn't settle probably-empirical questions like: (a) to what degree is 'more' better?; (b) how important are filler words? (c) how important are words that signal connection, causality, influence, reasoning?

  • Right, there's probably something more subtle like "semantic density within tokens is how models think"

    So it's probably true that the "Great question!---" type preambles are not helpful, but that there's definitely a lower bound on exactly how primitive of a caveman language we're pushing toward.