← Back to context

Comment by HarHarVeryFunny

1 day ago

It's unfortunate that Schmidhuber has both made many seminal contributions to the field, but also engages in "retroactive flag planting" whereby he claims credit for any current successes that are remotely related to anything he has worked on, even if only in terms of hand-wavy problem approach rather than actually building upon his own work.

It's obvious that things like memory, on various timescales (incl. working), selective attention, surprise (i.e. prediction failure) as a learning/memorization signal are going to be part of any AGI solution, but the question is how do you combine and realize these functionalities into an actual cognitive architecture?

Schmidhuber (or in this case you, on his behalf!) effectively saying "I worked on that problem, years ago" is irrelevant. He also worked on LSTMs, which learned to memorize and forget, and the reference section of the "Titans" paper leads to many more recent attempts - different proposed architectures - addressing the same problems around (broadly speaking) learning how best to use working memory. Lots of people suggesting alternatives, but it would seem no compelling solution that has been published.

If it's one of the commercial frontier model labs that does discover the next piece of the architectural puzzle in moving beyond transformers towards AGI, I very much doubt they'll be in any hurry to publish it!

"I like the idea of a meta-mechanism that learns to update an associative memory based on how surprising the data is."

Just pointing out that that idea was in some of Schmidhuber's earlier work.

> Schmidhuber (or in this case you, on his behalf!) effectively saying "I worked on that problem, years ago" is irrelevant.

Ok. People do read his work and get ideas from it even if this didn't necessarily. He had a lot of good stuff.

> but the question is how do you combine and realize these functionalities into an actual cognitive architecture?

I believe Schmidhuber gave one at the time?

  • Does it work out-of-the-box today?

    Execution is what matters. We can smoke a blunt and have some nice sounding ideas, but building something that works on data at scale is what actually counts.

    • I think it's widely agreed a lot of useful stuff came out of Schmidhubers lab. The example I gave was one of the first things that scaled in lots of ways especially in depth, and it shares some characteristics with this. I doubt it outperforms this Titan architecture or is equivalent. That's not the same as him just putting out random ideas while high.