Comment by raw_anon_1111
2 days ago
AI assistants can’t magically “do stuff” without “tools” exposed. A tool is always an API that someone has to write an expose to the orchestrator whether it’s AI or just a dumb intent system.
And ChatGPT can’t really “do anything” without access to tools.
You don’t want an LLM to have access to your total system without deterministic guardrails and limiting the permissions of what the tools can do just like you wouldn’t expose your entire database with admin privileges to the web.
You also don’t want to expose too many tools to the system. Every tool you expose you also have to have a description of what the tool does, the parameters it needs etc. Ot will both blow up your context window and start hallucinating. I suspect that’s why Alexa and Google Assistant got worse when they became LLM based and my narrow use cases don’t suffer those problems when I started implementing LLM based solutions.
And I am purposefully yada yada yadaing some of the technical complexities and I hate the entire “appeal to authority” thing. But I worked at AWS for 3.5 years until 2 years ago and I was at one point the second highest contributor to a popular open source “AWS Solution” that almost everyone in the niche had heard of dealing with voice automation. I really do know about this space.
yeah, there's just really nothing left to discuss. Apple could have been a real leader in the AI space had they hired the right researchers to implement LLMs and beaten OpenAI to the punch.
I understand that AI assitants need access to tools in order to do anything on a computer, I've been working with AI augmented development for a few months now and everytime I need a prompt to run a tool it asks for permission first, or just straight up gives me the command to paste into a terminal.
ideally this would have been abstracted away if siri were an LLM, with Apple controlling which apis siri has access to and bipassing user confirmation all together.
It would have been neat if I were able to say, "Hey, Siri: send a text to John Smith with a playfully angry prose thanking him for not inviting me to the party". which would have the LLM automatically craft the message and send upon confirmation, perhaps with a disclaimer "made with ai" at the bottom of the text or something along those lines.
"Hey, Siri: What's the weather in Los Angeles, California" would fallback to a web api endpoint.
"Hey, Siri: How do I compile my C# application without Visual Studio" would provide step-by-step instructions on working with MSBUILD.
different prompts would fallback on different apis that only Apple would expose. Obviously not allowing the user to gain root access to the system, which is what you would expect from Apple.
I guess from a purely technical standpoint, you'd train two models, one as "Safe" and the other as "Unsafe". "Safe" is what would be used by the end-user, allowing them to access safe data, apis, messaging, web.. you name it. "Unsafe" would be used internally at Apple and would have system-wide access, access to unlimited user data, root privileges, perhaps unsafe image generation and web search... basically no limit to what an LLM could achieve.
And spend billions and billions of dollars to get - a better Siri?
That's my point, Apple has invested so much into Siri that there's no reason why it's not the most advanced LLM in the world. They missed the mark completely. Why? If Jobs were still in charge, the entire team would have been gone years ago.