← Back to context

Comment by ImScared1234

3 years ago

Hey,

https://en.wikipedia.org/wiki/Universal_approximation_theore...

This theorem explain the limits, putting it in simple terms, most architectures are universal approximators that are constrained by the inductive bias that we give them, so far the approximator arquitectured that is less constrained by the inductive bias is the transformer, so it should be able to approximate any mathematical function, the current problem is that the attention mechanism have a quadratic scaling, so while is easy to scale it in text, is pretty hard to scale it with anything else to the same performance, even if not further discoveries are made, just with the computer power of the future it should be able to scale in every field, even with the techniques of today it gives pretty good performance in a lot of tasks.

This review of the paper an image is worth 16x16 words by Yannic Kilcher explains it better if you are interested.

https://youtu.be/TrdevFK_am4?t=1314