Comment by imoverclocked

8 months ago

It’s pretty great that despite having large data centers capable of doing this kind of computation, Apple continues to make things work locally. I think there is a lot of value in being able to hold the entirety of a product in hand.

24 comments

imoverclocked

xnx 8 months ago

Google has a family of local models too! https://ai.google.dev/gemma/docs

ivape 8 months ago
Gemma and Llama can’t be bundled commercially, which sucks because they make two of the leading small llms. Qwen3 might be the last one with an Apache license.
- nolist_policy 8 months ago
  
  You can bundle and use Gemma commercially[1].
  [1] https://ai.google.dev/gemma/terms
  
  1 reply →

coliveira 8 months ago

It's very convenient for Apple to do this: less expenses on costly AI chips, and more excuses to ask customers to buy their latest hardware.

nine_k 8 months ago
Users have to pay for the compute somehow. Maybe by paying for models run in datacenters. Maybe paying for hardware that's capable enough to run models locally.
- Bootvis 8 months ago
  
  I can upgrade to a bigger LLM I use through an API with one click. If it runs on my device device I need to buy a new phone.
  
  2 replies →
- zamadatix 8 months ago
  
  If iPhones were the efficient/smart way to pay for compute then Apple's datacenter would be built with those instead of servers.
- lostlogin 8 months ago
  
  But also: if Apple's way works, it’s incredibly wasteful.
  Server side means shared resources, shared upgrades and shared costs. The privacy aspect matters, but at what cost?
  
  8 replies →

ivape 8 months ago

It takes about a $400 dollar graphics card to comfortably run something like a 3b-8b model. Comfortable as in fast inference, good sized context. 3b-5b models are what devices can somewhat fit. That means for us to get good running local models, we’d have to shrink one of those $400 dollar graphics cards down to a phone.

I don’t see this happening in the next 5 years.

The Mac mini being shrunk down to phone size is probably the better bet. We’d have to bring down the power consumption requirements too by a lot. Edge hardware is a ways off.

nolist_policy 8 months ago
Gemma 3n E4B runs at 35tk/s prompt processing and 7-8 tk/s decode on my last last last gen flagship Android.
- ivape 8 months ago
  
  I doubt this. What kind of t/s are you getting once your context window is reasonably saturated? Probably slows down to a crawl making it not good enough yet (the hardware that is).

v5v3 8 months ago

With no company having a clear lead in everyday ai for the non technical mainstream user, there is only going to be a race to the bottom for subscription and API pricing.

Local doesn't cost the company anything, and increases the minimum hardware customers need to buy.

eru 8 months ago

> Local doesn't cost the company anything, [...]
Not completely true: those models are harder to develop. The logistics are a hassle.