Comment by choeger

3 years ago

I am waiting for someone publishing the theoretical limits of these "AI" systems. They're certainly impressive language models - don't get me wrong on that. But every algorithm and every model has its limits. To know the limits turns their application from hype into engineering. And of course, the hype-sellers will try to keep that from happening as long as possible.

Hey,

https://en.wikipedia.org/wiki/Universal_approximation_theore...

This theorem explain the limits, putting it in simple terms, most architectures are universal approximators that are constrained by the inductive bias that we give them, so far the approximator arquitectured that is less constrained by the inductive bias is the transformer, so it should be able to approximate any mathematical function, the current problem is that the attention mechanism have a quadratic scaling, so while is easy to scale it in text, is pretty hard to scale it with anything else to the same performance, even if not further discoveries are made, just with the computer power of the future it should be able to scale in every field, even with the techniques of today it gives pretty good performance in a lot of tasks.

This review of the paper an image is worth 16x16 words by Yannic Kilcher explains it better if you are interested.

https://youtu.be/TrdevFK_am4?t=1314

Hype sellers, despite being annoying and noisy, are not the reason why it's hard to figure out the theoretical limits.

To put it the form of a rhetorical question: many of these models are public, so why "wait" when you could do the research yourself?

> I am waiting for someone publishing the theoretical limits of these "AI" systems.

> To know the limits turns their application from hype into engineering.

It would be helpful to know how the models actually work under the hood.

But we made very good use of metals for thousands of years before we understood things like atoms, chemical bonds, lattices, etc.

Some engineering disciplines can be made up largely of empirical knowledge.

Engineering to me is "make the things we want out of the things we have", and not necessarily "design based on complete scientific theories".

  • I, as a Real Engineer, REFUSE to use ChatGPT until we have a working theory of quantum gravity. Enough of this bullshit where no one knows the fundamentals of what they’re working with.

What are the fundamental limits of language itself? Is English somehow more "emergent" than Korean? Isn't this more interesting than the actual execution mechanism?

The business of these new LLMs is next token prediction with context. This is also now a mission because it clearly works to some large extent. Where most would not have been willing to take a leap of faith prior, many can see some path now. I've been able to suspend my disbelief around language-as-computation long enough to discover new options.

You're looking for the universal approximation theorem. It's one of those cases where they can do anything in theory so the question is more are we chasing a turning tarpit or not, where everything is possible but nothing is easy