Comment by foobarqux
2 years ago
There is a real difference between the computational power of different computing systems even if in practice nothing is really Turing complete because nothing is infinite. In practice there really is a difference between something that is DFA-like and something that is Turing-machine-like.
> It's not important how Turing complete transformers are, only that they could potentially learn any class of computations necessary via training.
There is no real argument for this and persuasive evidence against, as I linked to above.
> Memory or not, all the computation is still being performed just in the forward pass.
This was clearly not the meaning of "feed forward" in the discussion above (especially since no one was talking about transformers with memory because those aren't really in use).
> It also doesn't mean it couldn't be.
You were making claims about current systems which are false. Now you've switched from "it's present in current systems" to "we haven't theoretically ruled out that we could invent such a system one day" but there is no reason to believe even this claim is likely.
> I have no proof you are performing any 'higher order computation' either
Humans perform recursive computation when they count or add and probably when they form sentences.
Nothing I said about current systems is false. "It's one big equation" isn't an accurate description of a forward pass.
There's no theory barring what computation a forward pass can or can't do. That line doesn't exist.
I didn't switch to anything. You're just arguing something that was never being argued in the first place. Never did I mention anything about being Turing complete or having memory until you brought it up.
>Humans perform recursive computation when they count or add and probably when they form sentences.
Okay ?
GPT-4 can add. Anything it outputs gets fed back. The transformations of previous layers are used for latter ones. It can perform recursive computation just fine.
> Nothing I said about current systems is false. "It's one big equation" isn't an accurate description of a forward pass.
It is and presents limits as to what can be computed, as does gradient descent.
> There's no theory barring what computation a forward pass can or can't do. That line doesn't exist.
It does and I linked to a survey paper describing computational limits. The only sense that there are no limits is, as I said originally, is the same sense that a finite state machine has no limits.
> I didn't switch to anything. You're just arguing something that was never being argued in the first place. Never did I mention anything about being Turing complete or having memory until you brought it up.
Yes I brought up Turing machines to show that there are computational limitations to transformers (I didn't bring up memory, you did). Because that claim can't be refuted about the current widely-deployed systems you switched to talking about the fact that it hasn't been theoretically ruled out that we could design such systems in the future. If you didn't want to switch claims then you shouldn't have done it.
> GPT-4 can count and add. Anything it outputs gets fed back. It can perform recursive computation just fine.
It's a well known problem that it cannot consistently add arbitrary numbers. You are probably the only person who claims otherwise. As for feedback: This is obviously not part of the training process so can't be part of what is "learned".
>It's a well known problem that it cannot consistently add arbitrary numbers.
Neither can humans. Calculators aren't just for saving time. Humans regularly mess up arithmetic.
Can you add arbitrary numbers any better than GPT-4?. Test it out yourself.
>As for feedback: This is obviously not part of the training process so can't be part of what is "learned".
A token that is part of the process of an ongoing recursive computation in text being fed to a transformer is definitely part of training. Training directly incentivizes being recursive in token space.