Comment by ddxv
2 months ago
Mind blowing they couldn't get this to work. It's struck me lately that the models don't seem to matter anymore, they're all equally good.
The UX and integration with regular phone features is what makes the tool shine and by now there should be plenty of open source models and know how to create their own.
What is Google offering that Apple can't figure out on their own?
Maybe people don't personal assitant AI enough to justify the investment? My phone has probably 6 or 7 AI tools that have talking features that I don't ever explore.
LLM business is not a one-shot figure it out and then collect some easy money, it a constant work and expenses just for LLM functionality. So if Apple analyzed this and decided that they would rather rent such capability, it seems quite logical. Also Google already has ties to Apple, they may even strike a deal where search on iOS is bartered (maybe partially) for Gemini service. Win-win. And Google is not going out of business any time soon, so more reliable than any pure-LLM corporation.
Another, less likely possibility is that Apple may be reluctant to steal enough data to train own LLM to a competitive level and then continue this in perpetuity. They have this notion that they are privacy oriented FAANG company, and may want to keep up this idea.
Maybe it is a sum total of a lot of factors, which in the end tilted the decision to a rental model.
I don't know, Gemini 2.5 has been the only model that's been able to not consistently make fundamental mistakes with my project as I've been working with it over the last year. Claud 3.7, 4.0, and 4.5 are not nearly as good. I gave up on chatgpt a couple years ago so I have no idea how they perform. They were bad when I quit using it.
Do you find that Gemini results are slightly different when you ask the same question multiple times? I found it to have the least consistently reproducible results compared to others I was trying to use.
Sometimes it will alternate between different design patterns for implementing the same feature on different generations.
If it gets the answer wrong and I notice it, often just regenerating will get past it rather than having to reformulate my prompt.
So, I'd say yeah...it is consistent in the general direction or understanding, but not so much in the details. Adjusting temp does help with that, but I often just leave it default regardless.
I use all of them about equally, and I don't really want to argue the point, as I've had this conversation with friends, and it really feels like it is becoming more about brand affiliation and preference. At the end of the day, they're random text generators and asking the same question with different seeds gives different results, and they're all mostly good.