Comment by giancarlostoro
6 hours ago
I wonder how long till some breakthrough comes along that makes a new architecture that can run efficiently and cheaper on basic hardware, that'd be the real AI bubble, if you could train and run inference locally at lower cost. Microsoft had one that is supposed to run fine on regular CPUs though I'm not sure how far along we can reasonably take that. They say our brains can store 2.5 PB, but we use drastically less (though I can't find a ballpark) of "RAM" to reason about things, so makes you wonder, just how efficient can we take things. Our bodies use drastically less power too.
How long? We already have that. Qwen3.6 have 35b/27b models that beat chatgpt4o. You can run them at home in one GPU. DeepSeekV4 just came up with a new way to have super long context with KV cache an order of magnitude smaller than before. It's already going on!
I've been experimenting with running a few models for local inference, some of them get "stuck" in a repeat loop of trying the same thing endlessly, its weird. Others are really good. If they can ever handle about 400k tokens (maybe less, but from experience with Claude after the 1 million token increase this seemed to be a good sweet spot) without going batcrap crazy I'll be impressed, mostly because I would like them to read more of the codebase instead of just making assumptions. Although I've been building a custom harness, and I'm just about to start working on the tool building features for the harness. I already have a system similar to what Beads does but I didn't like some things about Beads so I made my own to track tasks, so context window doesnt need to be super massive for task tracking.
> Our bodies use drastically less power too.
To be fair, we compute a lot slower too. No way in hell are you (or I) able to produce 'tokens' at the same speed as current models.
It'd be interesting to see an actual comparison of humans and AI performing the same (cognitive) task and measuring the amount of energy that was used.