Comment by mittermayr

6 days ago

All I can say is, I asked Siri today (verbatim): What is 75 degrees fahrenheit in celsius, and what is 85 degrees in fahrenheit — and it offered a web search about fahrenheit. The "and" completely disabled its most basic ability to do metric conversions.

So, it's nice to see Apple is doing research and talking about it, but we're out here waiting, still waiting, for anything useful to make of it all on our thousand-dollar devices that literally connect us to the world and contain our entire life data. It's what I would've expected from one of the most valuable companies in the world.

You asked 2 questions in a system made for 1 question at a time. Split these up and Siri answers them fine. You’re holding it wrong.

  • A tool that can handle more than one question at a time is useful. Modern LLMs handle that with ease. So it's completely reasonable to be critical of that limitation.

    • Sure, what’s not reasonable is expecting Siri to be a modern LLM, when they know it’s not. They asked a question they knew Siri couldn’t handle just to slam it. I’m not critical of a 5-function calculator for not one-shotting complex equations like a computer.

      While Siri only does one thing at a time, I trust the answer more, because it’s doing the actual math and not just guessing what the most likely answer is, like an LLM. We need to pick the right tool for the right job. Frankly, I don’t think an LLM is the right tool for conversations like this, and jumbling multiple questions into a single question is something people do with LLMs to get more use out of them during the day, this is an adaptation to a limitation of the free tier (and sometimes speed) of the LLM.

      3 replies →

    • Why is Siri being discussed in the context of LLMs and Apple Intelligence? Have they already released Siri 2.0 or am I missing something?

      5 replies →

  • Never mind that Infocom games running on my Apple ][+ could handle that sort of command in 1983.

    (Well, with multiple direct objects, anyway.)

  • "holding it wrong" was exactly the right phrase given how that phrase was used with the iPhone antenna bridging problem. This is an Apple product failing.

  • "You haven't contorted your comically simple query enough to make the brittle tool work. Throw the chicken bones better next time."

    • It’s been this way for over a decade. If someone hasn’t figured it out by now, that’s kind of on them.

      I’m not even sure why those two things would be asked as a single question. It seems like a very unnatural way to pose those two questions. Most humans would trip on that, especially if it was asked verbally.

      4 replies →

> What is 75 degrees fahrenheit in celsius, and what is 85 degrees in fahrenheit

Err, what? As a native English speaker human that's a pretty confusing question to me, too!

  • First, most of the English speaking world is not native.

    "As of 2022, there were about 400 million native speakers of English. Including people who speak English as a second language, estimates of the total number of Anglophones vary from 1.5 billion to 2 billion."

    Second, all popular models I tested did well with that query, including Gemini on Android (aka "ok Google"), except Apple's.

    https://en.m.wikipedia.org/wiki/English-speaking_world

    • I am not sure why you go on the subject of English speaking world etc. Anyway, the models you tested with that query, which I am not sure why we think is a good benchmark, are local models running on a wireless device or they use datacenter and only convey the text back and forth?

      1 reply →

I just tried this on my phone and just got two pop ups with the conversions appear in quick succession.

>> What is 75 degrees fahrenheit in celsius, and what is 85 degrees in fahrenheit

Probably wouldn't have made a difference but the second half of that statement isn't exactly clear. 85 degrees what?

I also think when you're chaining these two separate calculations together you get a problem when it comes to displaying the results.

  • That exact phrase "What is 75 degrees fahrenheit in celsius, and what is 85 degrees in fahrenheit" given to ChatGPT produces the correct result (it infers that the second degrees must be Celsius) and ChatGPT gives me a nicely laid out formula for the math of the conversion.

    So yeah, Apple is way behind on this stuff.

  • the fact is that gemini responds with this: 75 degrees Fahrenheit is 23.89 degrees Celsius, and 85 degrees Celsius is 185.00 degrees Fahrenheit.

  • Meanwhile users have been conditioned to expect a system that understand the multiple queries and answers them appropriately.

    • True. But for most of us, only in the past year. I have a few friends/relatives who have still never conversed with an LLM.

Your usage of Siri today (probably on an old version of iOS) frankly has nothing to do with the article we are discussing. Sorry to say this but it is going to take time. Comparing the performance of a chatgpt running in a big data center with a model running locally on a phone device... give it a few years.

  • People have been giving Siri a few years for a decade now. Siri used to run in a data center (and still does for older hardware and things like HomePods) and it has never supported compound queries.

    Siri needs to be taken out back and shot. The problem with “upgrading” it is the pull to maintain backwards compatibility for every little thing Siri did, which leads them to try and incorporate existing Siri functionality (and existing Siri engineers) to work alongside any LLM. Which leads to disaster, and none of it works and just made it all slower. They’ve been trying to do an LLM assisted Siri for years now and it’s the most public facing disaster the company has had in a while. Time to start over.

    • As a user, I'd gladly opt into a slightly less deeply integrated Siri that understands what I want from it.

      Build a crude router in front of it, if you must, or give it access to "the old Siri" as a tool it can call, and let the LLM decide whether to return its own or a Siri-generated response!

      I bet even smaller LLMs would be able to figure out, given a user input and Siri response pair, whether the request was resonably answered or whether the model itself could do better or at least explain that the request is out of capabilities for now.

      2 replies →

    • Those little things have been broken for a while now, it's best to bite the bullet and integrate LLM to Siri now.

  • > Your usage of Siri today (probably on an old version of iOS) frankly has nothing to do with the article we are discussing.

    Yes, but isn't that infuriating? The technology exits! It even exists, as evidenced by this article, in the same company that provides Siri!

    At least I feel that way every time I interact with it – or for that matter my Google Home speaker, ironically made and operated by the company that invented transformer networks.