Comment by bob1029
1 year ago
> Transformers required ~2.5x more training steps to achieve comparable performance, overfitting eventually.
> RNNs are particularly suitable for sequence modelling settings such as those involving time series, natural language processing, and other sequential tasks where context from previous steps informs the current prediction.
I would like to draw an analogy to digital signal processing. If you think of the recurrent-style architectures as IIR filters and feedforward-only architectures as FIR filters, you will likely find many parallels.
The most obvious to me being that IIR filters typically require far fewer elements to produce the same response as an equivalent FIR filter. Granted, the FIR filter is often easier to implement/control/measure in practical terms (fixed-point arithmetic hardware == ML architectures that can run on GPUs).
I don't think we get to the exponential scary part of AI without some fundamentally recurrent architecture. I think things like LSTM are kind of an in-between hack in this DSP analogy - You could look at it as FIR with dynamic coefficients. Neuromorphic approaches seem like the best long term bet to me in terms of efficiency.
Again from signal processing: depending on position of the poles in z-transformed filter transfer function the output of IIR has a narrow stability region that is typically carefully designed for. Otherwise IIR filters either exponentially decay to zero to exponentially grow to infinity. RNN cells like LSTM are "decaying filters" with non-linear gates introduced to stop decay and to "remember" things.
FIR filters are way simpler to design and can capture memory without hacks.
ELI5: Could you explain what neuromorphic approaches mean, and how they contribute to AI/AGI? My first impression as a layperson (probably wrong) is that this approach resembles ideas from the book "The Society of the Mind", where the system isn't just simulating neurons but involves a variety of methods and interactions across "agents" or sub-systems.
Neuromorphic mostly just means "like how the brain works". It encompasses a variety of software & hardware approaches.
The most compelling and obvious one to me is hardware purpose-built to simulate spiking neural networks. In the happy case, SNNs are extremely efficient. Basically consuming no energy. You could fool yourself into thinking we can just do this on the CPU due to the sparsity of activations. I think there is even a set of problems this works well for. But, in the unhappy cases SNNs are impossible to simulate on existing hardware. Neuronal avalanches follow power law distribution and meaningfully-large ones would require very clever techniques to simulate with any reasonable fidelity.
> the system isn't just simulating neurons but involves a variety of methods and interactions across "agents" or sub-systems.
I think the line between "neuron" and "agent" starts to get blurry in this arena.
We somehow want a network that is neuromorphic in structure but we don't want it to be like the brain and take 20 years or more to train?
Secondly how do we get to claim that a particular thing is neuromorphic when we have such a rudimentary understanding of how a biological brain works or how it generates things like a model of the world, understanding of self etc etc.
5 replies →
My take, for pragmatic reasons rather than how the brain actually works, is that an agent-based architecture is great because some tasks can be solved more effectively by specific algorithms or workflows rather than operating at the low level of neural networks (NN).
Neuromorphic has been an ongoing failure (for general purpose processors or even AI accelerators), ever since Carver Mead introduced (and quickly abandoned them) them nearly half a century ago. Bill Dally (NVidia CTO) concurs: "I keep getting those calls from those people who claim they are doing neuromorphic computing and they claim there is something magical about it because it's the way that the brain works ... but it's truly more like building an airplane by putting feathers on it and flapping with the wings!" From: Hardware for Deep Learning, HotChips 2023 keynote.
We have NO idea how the brain produces intelligence, and as long as that doesn't change, "neuromorphic" is merely a marketing term, like Neurotypical, Neurodivergent, Neurodiverse, Neuroethics, Neuroeconomics, Neuromarketing, Neurolaw, Neurosecurity, Neurotheology, Neuro-Linguistic Programming: the "neuro-" prefix is suggesting a deep scientific insight to fool the audience. There is no hope of us cracking the question of how the human brain produces high-level intelligence in the next decade or so.
Neuromorphic does work for some special purpose applications.
I like the feather analogy. Early on all humans knew about flight was from biology (watching birds fly) but trying to make a flying machine modeled after a bird would never work. We can fly today but plane designs are nothing like biological flying machines. In the same way, all we know about intelligence comes from biology and trying to invent an AGI modeled on biological intelligence may be just as impossible as a plane designed around how birds fly.
/way out of my area of expertise here
1 reply →
I love this book and have it sitting on my shelf right now! Read it when I was a kid and was amazed at the ideas in it, nowadays it's clearer to me that the author only had a grasp of how things like that would be built but still cool nonetheless.
I would highly recommend it to people who love a good "near future" scifi book.
I'm sure you know this, but I think "the author" Marvin Minsky should be mentioned by name since he was one of the foundational theorists in the field of AI in general, but particularly in NNs.
> I don’t think we get to the exponential scary part of AI without some fundamentally recurrent architecture
I’ve been thinking the same for a while, but I’m starting to wonder if giant context windows are good enough to get us there. I think recurrency is more neuromorphic, and possibly important in the longer run, but maybe not required for SI.
I’m also just a layman with just a surface level understanding of these things, so I may be completely ignorant and wrong.
Can we even implement IIR filters to give good performance and scaling at large scale on current architectures like GPUs ?
I don't think so. FIR filters can be unrolled and parallelized over the data. These are definitely possible to do on GPU to great effect. But, IIR filters constantly depend on the output of the prior time step, so you can't unroll anything. These would probably be faster to simulate on the CPU.
See my comment above. It's definitely doable and very fast.
Yes. See this paper: http://cs.txstate.edu/~mb92/papers/asplos18.pdf
And things have improved a lot since then.