← Back to context

Comment by getnormality

15 hours ago

> So far it seems to me that self-attention really brought new capabilities to a network

Do we have a layman explanation for what makes self-attention so uniquely powerful? Something more than "it lets you do self-attention".

Computational power. Without self attention, you have a sloppy implementation of something called a PDA (push-down-automaton) -- like an old HP calculator. With it, you have an even sloppier implementation of a Turing machine.

So (modulo a _lot_ of details) it increases the power from that of a "calculator" to that of a "computer".