Comment by vadimf
10 hours ago
I’m 100% sure the future consists of many models running on device. LLMs will be the mobile apps of the future (or a different architecture, but still intelligence).
10 hours ago
I’m 100% sure the future consists of many models running on device. LLMs will be the mobile apps of the future (or a different architecture, but still intelligence).
The future right now looks more like everything in remote datacenters, no autonomous capabilities and no control by the user. But I like yours better.
I don't mind the remote datacenters, I just don't like the lack of control.
This is the path forward, with some overhead.
1. Generic model that calls other highly specific, smaller, faster models. 2. Models loaded on demand, some black box and some open. 3. There will be a Rust model specifically for Rust (or whatever language) tasks.
In about 5-8 years we will have personalized models based upon all our previous social/medical/financial data that will respond as we would, a clone, capable of making decisions similar with direction of desired outcomes.
The big remaining blocker is that generic model that can be imprinted with specifics and rebuilt nightly. Excluding the training material but the decision making, recall, and evaluation model. I am curious if someone is working on that extracted portion that can be just a 'thinking' interface.
If anything, memory ain't getting cheaper, disks aren't either, and as for graphics cards, forget it.
People wont be competing with even a current 2026 SOTA from their home LLM nowhere soon. Even actual SOTA LLM providers are not competing either - they're losing money on energy and costs, hopping to make it up on market capture and win the IPO races.
I don’t think anyone needs to compete with the LLM SOTA to get the benefits of these technologies on-device.
Consumers don’t need a 100k context window oracle that knows everything about both T-Cells and the ancient Welsh Royal lineage. We need focused & small models which are specialised, and then we need a good query router.
We need them for what? Specialized models seem to provide a value comparable to what we've been doing with machine learning for eons, just more inefficient to train and to run.