Comment by m_ke
6 days ago
Yes it would require completely new hardware and most likely ditching gradient descent for alternative optimization methods, though I'm not convinced that we'd need to turn to discrete optimization.
Some recent works that people might find interesting:
- Evolution Strategies at the Hyperscale - https://eshyperscale.github.io/
- Introducing Nested Learning: A new ML paradigm for continual learning - https://research.google/blog/introducing-nested-learning-a-n...
- Less is More: Recursive Reasoning with Tiny Networks - https://arxiv.org/abs/2510.04871
- Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs - https://arxiv.org/abs/2511.16664
No comments yet
Contribute on Hacker News ↗