Comment by cush
2 days ago
I'll bite
1. Have a user interface. Sometimes I'll ask a question and Siri actually provides a good enough answer, and while I'm reading it, the Siri response window just disappears. Siri is this modal popup with no history, no App, and no UI at all really. Siri doesn't have a user interface, and it should have one so that I can go back to sessions and resume them or reference them later and interact with Siri in more meaningful ways.
2. Answer questions like a modern LLM does. Siri often responds with very terse web links. I find this useful when I'm sitting with friends and we don't remember if Lliam Neeson is alive or not - for basic fact-checking. This is the only use case where it's useful I've found, when I want to peel my attention away for the shortest period of time. If ChatGPT could be bound to a power button long-press, then I'd cease to use Siri for this use case. Otherwise Siri isn't good for long questions because it doesn't have the intelligence, and as mentioned before, has no user interface.
3. Be able to do things conversationally, based on my context. Today, when I "Add to my calendar Games at Dave's house" it creates a calendar entry called "Games" and sets the location to a restaurant called "Dave's House" in a different country. My baseline expectation is that I should be able to work with Siri, build its memory and my context, and over time it becomes smarter about the things I like to do. The day Siri responds with "Do you mean Dave's House the restaurant in another country, or Dave, from your contacts?" I'll be happy.
For 1, I think we are getting farther away from this.
Siri's current architecture now provides context into the prompt, such as the app/window that has focus and the content loaded into it. In that sense, Siri is more like the MacOS menu bar than an app. A consolidated view of Siri history may look disjointed, in that there is a lot of context hidden if all it shows is a query like "when was this building built?".
Even more so, it might not provide the functionality desired if you go look at historic chats and ask "who was the architect?", unless all that context was actually captured. However, that context was never formatted in a way that was intended to be clearly displayed to the user. That in itself creates a lot of challenges around things like user consent since Siri can farm off queries to other (online) tools and world-knowledge AI services.
There is at least a UX paradigm for this - clipboard history. Coincidentally, Tahoe built clipboard history into Spotlight. But clipboard history lends itself to perhaps being more a complete and self contained snapshot. I'm not sure Siri is being built to work this way because of implicit context.
For 2, at a certain point this gets farmed off to other tools or other AI services. The Gemini agreement is for the foundational model, not large "world knowledge" models or backing databases. Today, Siri answers this question by providing bibliographical information inline from Wikipedia, using internal tools. The model itself just isn't able to answer the actual question (e.g. it will just say his birthday).
For 3, the model already has substantial personal context (as much as apps are willing to give it) and does have state in between requests. This is actually one of the issues with Siri today - that context changes the behavior of the command and control engine in interesting ways, phone to phone and sometimes moment to moment.
Unfortunately, I think stopping and asking for clarification is not something generative AI currently excels at.
Thanks for sharing. 1. Could be fixed today. 2./3. need a good enough LLM.
btw: I hope you will visit Dave's House someday in the future.
My wife and I got a kick out of your “Games at Dave's house” example. Thanks for sharing
Burnie Burns of Rooster Teeth made a massage appointment with the head of Xbox programming (luckily Siri was not that competent that it said "Mr. Appointment" in the invite)
https://www.youtube.com/watch?v=r499DeN770M
>If ChatGPT could be bound to a power button long-press, then I'd cease to use Siri for this use case
This should be possible, go to Settings->Action Button->Controls and search for ChatGPT
Isn’t its voice the ui? It should respond using the same context of the request. Voice and natural language.
If you ask for a website it should open a browser.
Edit: everything else spot on
> Isn’t its voice the ui? It should respond using the same context of the request. Voice and natural language.
Yeah it’s an interesting idea, but visuals are required sometimes. Even the simple task of “List the highest rated Mexican restaurants near me” works perfectly well enough with old crappy Siri. You’ll get a list of the highest rated Mexican restaurants near you. But as soon as you open the first restaurant, Siri closes and the list is gone. You can’t view the second restaurant. To get the list back you need to ask Siri again.
There’s no world in which that user experience makes a viable product. It’s a completely broken user experience no matter how smart the Gemini model is.
Yes, the lack of context or history has annoyed me in the past too.
Also, Liam Neeson just catching strays over here