← Back to context

Comment by weitendorf

1 year ago

> There are problems that are easy for human beings but hard for current LLMs (and maybe impossible for them; no one knows). Examples include playing Wordle and predicting cellular automata (including Turing-complete ones like Rule 110). We don't fully understand why current LLMs are bad at these tasks.

Wordle and cellular automata are very 2D, and LLMs are fundamentally 1D. You might think "but what about Chess!" - except Chess is encoded extremely often as a 1D stream of tokens to notate games, and bound to be highly represented in LLMs' training sets. Wordle and cellular automata are not often, if ever, encoded as 1D streams of tokens - it's not something an LLM would be experienced with even if they had a reasonable "understanding" of the concepts. Imagine being an OK chess player, being asked to play a game blindfolded dictating your moves purely via notation, and being told you suck.

> Providing an LLM with examples and step-by-step instructions in a prompt means the user is figuring out the "reasoning steps" and handing them to the LLM, instead of the LLM figuring them out by itself. We have "reasoning machines" that are intelligent but seem to be hitting fundamental limits we don't understand.

You have probably heard of this really popular game called Bridge before, right? You might even be able to remember tons of advice your Grandma gave you based on her experience playing it - except she never let you watch it directly. Is Grandma "figuring out the game" for you when she finally sits down and teaches you the rules?

Not an authority in the matter, but afaik, with position encodings (part of the Transformers architecture), they can handle dimensionality just fine. Actually some people tried to do 2D Transformers and the results were the same.

Visual transformers are gaining traction and they are 100% focus in 2d data.

Since when can LLM play chess? It can't understand it at all. You would have to filter out all the invalid moves until it spits a valid one.