Comment by imoverclocked
6 months ago
It’s pretty great that despite having large data centers capable of doing this kind of computation, Apple continues to make things work locally. I think there is a lot of value in being able to hold the entirety of a product in hand.
Google has a family of local models too! https://ai.google.dev/gemma/docs
Gemma and Llama can’t be bundled commercially, which sucks because they make two of the leading small llms. Qwen3 might be the last one with an Apache license.
You can bundle and use Gemma commercially[1].
[1] https://ai.google.dev/gemma/terms
1 reply →
It's very convenient for Apple to do this: less expenses on costly AI chips, and more excuses to ask customers to buy their latest hardware.
Users have to pay for the compute somehow. Maybe by paying for models run in datacenters. Maybe paying for hardware that's capable enough to run models locally.
I can upgrade to a bigger LLM I use through an API with one click. If it runs on my device device I need to buy a new phone.
2 replies →
If iPhones were the efficient/smart way to pay for compute then Apple's datacenter would be built with those instead of servers.
But also: if Apple's way works, it’s incredibly wasteful.
Server side means shared resources, shared upgrades and shared costs. The privacy aspect matters, but at what cost?
8 replies →
It takes about a $400 dollar graphics card to comfortably run something like a 3b-8b model. Comfortable as in fast inference, good sized context. 3b-5b models are what devices can somewhat fit. That means for us to get good running local models, we’d have to shrink one of those $400 dollar graphics cards down to a phone.
I don’t see this happening in the next 5 years.
The Mac mini being shrunk down to phone size is probably the better bet. We’d have to bring down the power consumption requirements too by a lot. Edge hardware is a ways off.
Gemma 3n E4B runs at 35tk/s prompt processing and 7-8 tk/s decode on my last last last gen flagship Android.
I doubt this. What kind of t/s are you getting once your context window is reasonably saturated? Probably slows down to a crawl making it not good enough yet (the hardware that is).
With no company having a clear lead in everyday ai for the non technical mainstream user, there is only going to be a race to the bottom for subscription and API pricing.
Local doesn't cost the company anything, and increases the minimum hardware customers need to buy.
> Local doesn't cost the company anything, [...]
Not completely true: those models are harder to develop. The logistics are a hassle.