Comment by scosman

10 hours ago

I think that's what they are trying to avoid. If you need on-device intelligence, their pitch was "The model the device already has is best", and if you need something more specific an adapter (aka, a fine-tune/lora) is best.

They were wrong when their on-device model was way behind. They still might be right in the long term.

While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode.

Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps.

- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.

- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.

- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.

- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)

You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app.

But models aren't universally best, especially small ones. For text Gemma is great. For vision qwen3.6 is amazing.