Comment by scosman

10 hours ago

I think that's what they are trying to avoid. If you need on-device intelligence, their pitch was "The model the device already has is best", and if you need something more specific an adapter (aka, a fine-tune/lora) is best.

They were wrong when their on-device model was way behind. They still might be right in the long term.

While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode.

Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps.

- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.

- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.

- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.

- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)

You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app.

1 comment

scosman

scotty79 5 hours ago

But models aren't universally best, especially small ones. For text Gemma is great. For vision qwen3.6 is amazing.