← Back to context

Comment by m_ke

6 days ago

Yes it would require completely new hardware and most likely ditching gradient descent for alternative optimization methods, though I'm not convinced that we'd need to turn to discrete optimization.

Some recent works that people might find interesting:

- Evolution Strategies at the Hyperscale - https://eshyperscale.github.io/

- Introducing Nested Learning: A new ML paradigm for continual learning - https://research.google/blog/introducing-nested-learning-a-n...

- Less is More: Recursive Reasoning with Tiny Networks - https://arxiv.org/abs/2510.04871

- Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs - https://arxiv.org/abs/2511.16664