1. Have a user interface. Sometimes I'll ask a question and Siri actually provides a good enough answer, and while I'm reading it, the Siri response window just disappears. Siri is this modal popup with no history, no App, and no UI at all really. Siri doesn't have a user interface, and it should have one so that I can go back to sessions and resume them or reference them later and interact with Siri in more meaningful ways.
2. Answer questions like a modern LLM does. Siri often responds with very terse web links. I find this useful when I'm sitting with friends and we don't remember if Lliam Neeson is alive or not - for basic fact-checking. This is the only use case where it's useful I've found, when I want to peel my attention away for the shortest period of time. If ChatGPT could be bound to a power button long-press, then I'd cease to use Siri for this use case. Otherwise Siri isn't good for long questions because it doesn't have the intelligence, and as mentioned before, has no user interface.
3. Be able to do things conversationally, based on my context. Today, when I "Add to my calendar Games at Dave's house" it creates a calendar entry called "Games" and sets the location to a restaurant called "Dave's House" in a different country. My baseline expectation is that I should be able to work with Siri, build its memory and my context, and over time it becomes smarter about the things I like to do. The day Siri responds with "Do you mean Dave's House the restaurant in another country, or Dave, from your contacts?" I'll be happy.
For 1, I think we are getting farther away from this.
Siri's current architecture now provides context into the prompt, such as the app/window that has focus and the content loaded into it. In that sense, Siri is more like the MacOS menu bar than an app. A consolidated view of Siri history may look disjointed, in that there is a lot of context hidden if all it shows is a query like "when was this building built?".
Even more so, it might not provide the functionality desired if you go look at historic chats and ask "who was the architect?", unless all that context was actually captured. However, that context was never formatted in a way that was intended to be clearly displayed to the user. That in itself creates a lot of challenges around things like user consent since Siri can farm off queries to other (online) tools and world-knowledge AI services.
There is at least a UX paradigm for this - clipboard history. Coincidentally, Tahoe built clipboard history into Spotlight. But clipboard history lends itself to perhaps being more a complete and self contained snapshot. I'm not sure Siri is being built to work this way because of implicit context.
For 2, at a certain point this gets farmed off to other tools or other AI services. The Gemini agreement is for the foundational model, not large "world knowledge" models or backing databases. Today, Siri answers this question by providing bibliographical information inline from Wikipedia, using internal tools. The model itself just isn't able to answer the actual question (e.g. it will just say his birthday).
For 3, the model already has substantial personal context (as much as apps are willing to give it) and does have state in between requests. This is actually one of the issues with Siri today - that context changes the behavior of the command and control engine in interesting ways, phone to phone and sometimes moment to moment.
Unfortunately, I think stopping and asking for clarification is not something generative AI currently excels at.
Burnie Burns of Rooster Teeth made a massage appointment with the head of Xbox programming (luckily Siri was not that competent that it said "Mr. Appointment" in the invite)
> Isn’t its voice the ui? It should respond using the same context of the request. Voice and natural language.
Yeah it’s an interesting idea, but visuals are required sometimes. Even the simple task of “List the highest rated Mexican restaurants near me” works perfectly well enough with old crappy Siri. You’ll get a list of the highest rated Mexican restaurants near you. But as soon as you open the first restaurant, Siri closes and the list is gone. You can’t view the second restaurant. To get the list back you need to ask Siri again.
There’s no world in which that user experience makes a viable product. It’s a completely broken user experience no matter how smart the Gemini model is.
I'm chuckling at the idea of pirating software in 1996.
iirc even in 1999, I couldn't figure out why Windows update required me to use internet exploder. It would take forever to download updates over dialup.
I should be able to completely control my phone with voice and ask it to do anything it is capable of and it should just work:
"Hi Siri, can you message Katrina on WhatsApp that Judy is staying 11-15th Feb and add it to the shared Calendar, confirm with me the message to Kat and the Calendar start and end times and message."
Could it just fucking work? "Hey Siri turn on the [room name] room lights" and it gives me a positive chime and ... doesn't turn any lights on? In any of my rooms?
I'll bite
1. Have a user interface. Sometimes I'll ask a question and Siri actually provides a good enough answer, and while I'm reading it, the Siri response window just disappears. Siri is this modal popup with no history, no App, and no UI at all really. Siri doesn't have a user interface, and it should have one so that I can go back to sessions and resume them or reference them later and interact with Siri in more meaningful ways.
2. Answer questions like a modern LLM does. Siri often responds with very terse web links. I find this useful when I'm sitting with friends and we don't remember if Lliam Neeson is alive or not - for basic fact-checking. This is the only use case where it's useful I've found, when I want to peel my attention away for the shortest period of time. If ChatGPT could be bound to a power button long-press, then I'd cease to use Siri for this use case. Otherwise Siri isn't good for long questions because it doesn't have the intelligence, and as mentioned before, has no user interface.
3. Be able to do things conversationally, based on my context. Today, when I "Add to my calendar Games at Dave's house" it creates a calendar entry called "Games" and sets the location to a restaurant called "Dave's House" in a different country. My baseline expectation is that I should be able to work with Siri, build its memory and my context, and over time it becomes smarter about the things I like to do. The day Siri responds with "Do you mean Dave's House the restaurant in another country, or Dave, from your contacts?" I'll be happy.
For 1, I think we are getting farther away from this.
Siri's current architecture now provides context into the prompt, such as the app/window that has focus and the content loaded into it. In that sense, Siri is more like the MacOS menu bar than an app. A consolidated view of Siri history may look disjointed, in that there is a lot of context hidden if all it shows is a query like "when was this building built?".
Even more so, it might not provide the functionality desired if you go look at historic chats and ask "who was the architect?", unless all that context was actually captured. However, that context was never formatted in a way that was intended to be clearly displayed to the user. That in itself creates a lot of challenges around things like user consent since Siri can farm off queries to other (online) tools and world-knowledge AI services.
There is at least a UX paradigm for this - clipboard history. Coincidentally, Tahoe built clipboard history into Spotlight. But clipboard history lends itself to perhaps being more a complete and self contained snapshot. I'm not sure Siri is being built to work this way because of implicit context.
For 2, at a certain point this gets farmed off to other tools or other AI services. The Gemini agreement is for the foundational model, not large "world knowledge" models or backing databases. Today, Siri answers this question by providing bibliographical information inline from Wikipedia, using internal tools. The model itself just isn't able to answer the actual question (e.g. it will just say his birthday).
For 3, the model already has substantial personal context (as much as apps are willing to give it) and does have state in between requests. This is actually one of the issues with Siri today - that context changes the behavior of the command and control engine in interesting ways, phone to phone and sometimes moment to moment.
Unfortunately, I think stopping and asking for clarification is not something generative AI currently excels at.
Thanks for sharing. 1. Could be fixed today. 2./3. need a good enough LLM.
btw: I hope you will visit Dave's House someday in the future.
My wife and I got a kick out of your “Games at Dave's house” example. Thanks for sharing
Burnie Burns of Rooster Teeth made a massage appointment with the head of Xbox programming (luckily Siri was not that competent that it said "Mr. Appointment" in the invite)
https://www.youtube.com/watch?v=r499DeN770M
>If ChatGPT could be bound to a power button long-press, then I'd cease to use Siri for this use case
This should be possible, go to Settings->Action Button->Controls and search for ChatGPT
Isn’t its voice the ui? It should respond using the same context of the request. Voice and natural language.
If you ask for a website it should open a browser.
Edit: everything else spot on
> Isn’t its voice the ui? It should respond using the same context of the request. Voice and natural language.
Yeah it’s an interesting idea, but visuals are required sometimes. Even the simple task of “List the highest rated Mexican restaurants near me” works perfectly well enough with old crappy Siri. You’ll get a list of the highest rated Mexican restaurants near you. But as soon as you open the first restaurant, Siri closes and the list is gone. You can’t view the second restaurant. To get the list back you need to ask Siri again.
There’s no world in which that user experience makes a viable product. It’s a completely broken user experience no matter how smart the Gemini model is.
Yes, the lack of context or history has annoyed me in the past too.
Also, Liam Neeson just catching strays over here
I'm sorry, I can't answer that right now.
Would you like to click this button which takes what you said and executes it as a Google search in Safari?
Now playing You're Missing, by Bruce Springsteen on Apple Music
You don't have a subscription for Apple Music. Here is 1 month free trial.
Here is Missing by Everything But The Girl on Apple Music
Siri to function above the level of Dragon NaturallySpeaking '95
Fantastic reference. I remember pirating this from microcrap.com in about 1996.
> 1996
I'm chuckling at the idea of pirating software in 1996.
iirc even in 1999, I couldn't figure out why Windows update required me to use internet exploder. It would take forever to download updates over dialup.
7 replies →
ANY ability to answer simple questions without telling me to open Safari and read a webpage for myself...?
I should be able to completely control my phone with voice and ask it to do anything it is capable of and it should just work:
"Hi Siri, can you message Katrina on WhatsApp that Judy is staying 11-15th Feb and add it to the shared Calendar, confirm with me the message to Kat and the Calendar start and end times and message."
They will never do this, and the lack of it can be marketed as a security feature.
Well that’s what they sold people in June 2024
Could it just fucking work? "Hey Siri turn on the [room name] room lights" and it gives me a positive chime and ... doesn't turn any lights on? In any of my rooms?
Judging by another comment, it probably turned the lights on in a restaurant in a different country.
Somewhere deep inside your iPhone, an LED is toggling.
Apple’s insistence on not ever displaying error messages is infuriating.
Elementary anti-spam.