Comment by ActorNightly

3 months ago

In theory, you could have a large enough markov chain that mimicks an LLM, you would just need it to be exponentially larger in width.

After all, its just matrix multplies start to finish.

A lot of the other data operation (like normalization) can be represented as matrix multiplies, just less efficiently. In the same way that a transformer can be represented inefficiency as a set of fully connected deep layers.

1 comment

ActorNightly

kleiba 3 months ago

True. But the considerations re: practicability are not to be ignored.