Comment by famouswaffles
2 years ago
How many passes it is irrelevant. You can perform any computation you like in a single pass if you have enough compute time.
Trained transformers have limited computer time per token so each query is compute limited, but this is trivially increased, by increasing tokens, or by increasing dimensions in the next training round so that each token permits more compute time.
A forward pass is not one big equation and I have no clue who you think it is. It's a series of computations, computations that depends on the query awaiting prediction. It's not even the same series of computations for each query because not all neurons are getting activated period and even when the same neurons get activated, they are not necessarily getting activated in the same way.
> You can perform any computation you like in a single pass if you have enough compute time.
You can't perform _any_ computation. A single forward pass through a neutral network can perform many classes of computation, and it can _approximate_ all... but that's not a guarantee that the approximation will be good (and there's classes for which the approximation is pretty much guaranteed to be bad).
You're right but I didn't say anything about guarantees since that wasn't really the point of the argument. Yes, you can guarantee nothing but the point of discussion was whether a forward pass would deny specific classes of computations simply because it was just a forward class. It won't.
Yes in the same way that a lookup table can compute anything if you make it large enough (and where you need to know the size beforehand).
"Transformers are not Turing complete" https://www.lifeiscomputation.com/transformers-are-not-turin...
See here for example for a survey of work on the limits of computational expressivity of transformers https://www.semanticscholar.org/paper/Transformers-as-Recogn...
7 replies →