← Back to context

Comment by martinald

20 days ago

I just do not understand this attitude. ChatGPT alone has hundreds of millions of active users that are clearly getting value from it, despite any mistakes it may make.

To me the almost unsolvable problem Apple has is wanting to do as much as possible on device, but also have been historically very stingy with RAM (on iOS and Mac devices - iOS more understandably, given it doesn't really need huge amounts of RAM until LLMs came along). This gives them a real real problem, having to use very small models which hallucinate a lot more than giant cloud hosted ones.

Even if they did manage to get 16GB of RAM on their new iPhones that is still only going to be able to fit a 7b param model at a push (leaving 8GB for 'system' use).

In my experience even the best open source 7B local models are close to unusable. They'd have been mindblowning a few years ago but when you are used to "full size" cutting edge models it feels like an enormous downgrade. And I assume this to always be the case; while small models are always improving, so are the full size ones, so there will always be a big delta between them, and people are already used to the large ones.

So I think Apple probably needs to shift to using cloud services more like their Private Compute idea, but they have an issue there in so much that they have 1b+ users and it is not trivial at all to be able to handle that level of cloud usage for core iOS/Mac features (I suspect this is why virtually nothing uses Private Compute at the moment). Even if each iOS user only did 10 "cloud LLM" requests a day, that's over 10b/requests a day (10x the scale that OpenAI currently handles). And in reality it'd ideally be orders of magnitude more than that given how many possible integration options they are for mobile devices alone.

> ChatGPT alone has hundreds of millions of active users that are clearly getting value from it

True, but it’s been years now since the debut of the chat-interface-AI to the general public and we have yet to figure out another interface that would work for generative AI for the general public. I’d say the only other example is Adobe and what they are doing with generative AI in their photo editing tools, but thats a far cry from a “general public” type thing. You have all the bumbling nonsense coming out of Microsoft and Google trying to shove AI into whatever tools they are selling while still getting 0 adoption. The copilot and Gemini corporate sales teams have been both “restructured“ this year because they managed to sign up so many clients in 2023/2024 and all those clients refused to renew.

When it comes to the general public, we have yet to find a better application of AI than a chat interface. Even outside of the general public, I oversee few teams that are building “agentic AI tools/workflows” and the amount of trouble they have to go through to make something slightly coherent is insane. I still believe that the right team with the right architecture and design can probably achieve things that are incredible with LLMs, but it’s not as easy as the term “AI” makes it sound.

  • Putting generative AI inside tools without giving deep understanding of those tools to the AI generally made me more confused and frustrated than outside of it:

    for example Gemini forced itself on me on my SMS app, so I thought I ask it to search something simple inside the messages, and it just started generating some random text about searching and saying that it doesn't have access to the messages themselves.

    When I use ChatGPT, of course I know they don't have access to my SMSs (it would be weird).

    I can give ChatGPT the exact context I want to, and I know it will work with it as long as the context is not too big.

    • I think that’s fundamentally an important part of the value of ChatGPT compared to all the “integrated” AI approaches. Like you, I like that ChatGPT doesn’t have access to my SMSs, emails, documents, code, etc. When I use it, I like to choose the context it’s exposed to and then evaluate the output based on how much context I exposed it to.

      For example, one of my main use cases for ChatGPT is crafting SQL queries. I never liked the SQL language. I understand how powerful it is, but I was never good at it. I find it to be structured in such an awkward and confusing way. I’m not smart enough to express what I’m looking for in SQL format. But ChatGPT is perfect for that. I give it a brief explanation of tables schema then ask it for a query and it spits out a SQL query for me. 70% of the time it looks good, 30% I’m skeptical it solves what I want.

      I have tried many “Database/postgres AI services” that promise much better interface. They ask you for creds to your database, then inject all context/schema into the LLM so you just ask questions directly. They have all been too underwhelming. I ask a question about joining and reducing tables A and B, and it insists to involves tables C and D. I have to say “don’t worry about C and D”. So it involves E and F. I say “fuck, only consider A and B” and it generated an invalid query that assumes A and B have some columns from C, D, E or F. I say “please don’t consider any schema other that A and B schema. I’m looking for a result of joining A and B, reduces by columns X/Y giving the average of those” then feel stupid having to argue with a machine about what I’m looking for.

> ChatGPT alone has hundreds of millions of active users that are clearly getting value from it, despite any mistakes it may make.

You assume hundreds of millions of users could identify serious mistakes when they see them.

But humans have demonstrated repeatedly that they can't.

I don't think it can ever be understated how dangerous this is.

> I think Apple probably needs to shift to using cloud services more

You ignore lessons from the the recent spat between Apple and the UK.

  • > You assume hundreds of millions of users could identify serious mistakes when they see them. But humans have demonstrated repeatedly that they can't.

    same is true for humans whether they're interacting with LLMs or other humans. so I'm inclined to take statements like

    > I don't think it can ever be understated how dangerous this is.

    as hysteria

    • > > I don't think it can ever be understated how dangerous this is.

      > as hysteria

      It's not hysteria. Humans haven been trained, for better or worse, that computers are effectively infallible. Computed answers are going from near-100% correct to not-even-80% correct in an extremely short time.

      It's not hysteria to say this is a dangerous combination.

      2 replies →

    • while some humans are confidently wrong occasionally, if this seems to be a pattern with someone we move them to a different role in the organization, we stop trusting them, or stop asking their opinion on that subject, or remove them from the org entirely.

      far more often than being confidently wrong, the human will say, i’m not positive on the answer, let me double check and get back to you.

      1 reply →

There are thresholds for every technology where it is "good enough", same with LLMs or SLMs (on-device). Machine learning is already running on-device for photo classification/search/tagging, and even 1.5b models are getting fast really good, as long as they are well trained and used for the right task. Something like email writing, TTS and rewriting and other tasks should be easily doable, the "semantic search aspect" of chatbots are basically a new way of "google/web search" and probably stay in the cloud, but that's not their most crucial use.

Not a big fan of Apple's monopoly, but I like their privacy on-device handling. I don't care for Apple but on-device models are definitely the way to go from a consumer point of view.

The very fact that Apple thought they were going to run AI on iPhones says that leadership doesn't understand AI technology and simply mandated requirements to engineers without wanting to be bothered by details. In other words, Apple seems to be badly managed

  • I disagree. I think targeting running models on end user devices is a good goal, and it's the ideal case for user privacy and latency.

    The human brain consumes around 20 watts, while of course there are substantial differences with implementation I think it's reasonable to draw a line and say that eventually we should expect models to hit similar levels of performance per watt. We see some evidence now that small models can achieve high levels of performance with better training techniques, and it's perfectly conceivable that acceptable levels of performance for general use will eventually be baked into models small enough to run on end hardware. And at the speed of development here, "eventually" could mean 1-2 years.

  • Actually, it's more of a sad capitulation to lazy armchair "analysts" and "pundits" who whined incessantly that Apple was "behind on AI," without taking stock of the fact that Apple does not NEED "AI." It does not serve their core businesses, product line, or users.

    Instead of loudly jumping on this depressing bandwagon, Apple should have quietly improved Siri and then announced it when it was WORKING.

I suspect an issue at least as big is that they're running into a lot of prompt injection issues (even totally accidentally) with their attempts at personal knowledge base/system awareness stuff, whether remotely processed or not. Existing LLMs are already bad at this even with controlled inputs; trying to incorporate broad personal files in a Spotlight-like manner is probably terribly unreliable.

  • This is my experience as pretty heavy speech-to-text user (voice keyboard) - as they’ve introduced more AI features, I’ve started to have all sorts of nonsense from recent emails or contacts get mixed into simple transcriptions

    It used to have no problem with simple phrases like “I’m walking home from the market” but now I’ll just as often have it transcribe “I’m walking home from the Mark Betts”, assuming Mark Betts was a name in my contacts, despite that sentence making much less structural sense

    It’s bad enough that I’m using the feature much less because I have to spend as much time copyediting transcribed text before sending as I would if I just typed it out by hand. I can turn off stuff like the frequently confused notification summaries, but the keyboard has no such control as far as I know

> In my experience even the best open source 7B local models are close to unusable. They'd have been mindblowning a few years ago but when you are used to "full size" cutting edge models it feels like an enormous downgrade

Everything has limits - the only differences is where they are, and therefore how often you meet them.

If you are working with AI, using local models shows you where the problems can (and will) happen, which helps you write more robust code because you will be aware of these limits!

It's like how you write more efficient code if you have to use a resource constrained system.

Its just another tool (or toy), great at some stuff, almost useless or worse for another, and its fucking downed our throats at every corner, from every direction. I start to hate everything AI-infused with passion. Even here on HN, many people are not rational. I am willing to pay less for AI-anything, not the same and f_cking definitely not more.

Cargo culting of clueless managers which make long term usability of products much worse, everything requiring some stupid cloud, basic features locked up and you will be analyzed, this is just another shit on top.

You have any massive hype, you normally get this shit. Once big wave dies down with unavoidable sad moments for some, and tech progresses further (as it will) real added value for everybody may show up.

As for work - in my corporation, despite having pure dev senior role, coding is 10-20% of the work, and its part I can handle just fine on my own, I don't need babysitting from almost-correct statistical models. In fact I learn and keep fresh much better when still doing it on my own. You don't become or stay senior when solutions are handed down to you. Same reason I use git in command line and not clicking around. For code sweatshops I can imagine much more added value, but not here in this massive banking corporation. Politics, relationships, knowing processes and their quirks and limitations is what progresses stuff and gets it done. AI won't help here, if anybody thinks differently they have f_cking no idea what I talk about. In 10 years it may be different, lets open the discussion again then.

> ChatGPT alone has hundreds of millions of active users that are clearly getting value from it

Idk about that, wouldn't pay for it.

  • What do you mean? Lot of people pay, (me included) and are getting value. If you use it but don't pay, you still get value, otherwise you would be wasting your time. If you don't use it at all, that's your choice to make.

ChatGPT is mostly a tool which prints words on the screen; what the user does with those words is outside the domain and the area of responsibility of OpenAI. With iOS the expectation is that it will also do actions. It's almost a blessing that it hallucinates a lot and in obvious ways. It's going to get worse when it starts hallucinating in ways, and doing actions on user's behalf, that are subtle, almost unnoticeable.

With the current state of LLMs they should stay within the bounds of writing out random, but statistically likely, words. However, I think we are already at a point where will be paying price later down the road for all the hallucinations we have unleashed to the world in the past few years.

> ChatGPT alone has hundreds of millions of active user that are clearly getting value from it

So does OG Siri or Alexa, Letdown does not mean completely useless, it just means what the users are getting is far less than what they were promised, not that they get nothing.

In this context AI will be a letdown regardless of improvements in offline or even cloud models. It is not only because of additional complexity of offline model Apple will not deliver, their product vision just does not look achievable in the current state of tech in LLMs [1].

Apple itself while more grounded compared to peers who regularly talk about building AGI, or God etc, has been still showing public concept demos akin to what gaming studios or early stage founders do. Reality usually fall short when you run ahead of product development in marketing, it will be no different for Apple.

This is a golden rule of brand and product development - never show what have not built fully to the public if you want them to trust your brand.

To be clear, it is not bad for the company per se to do this, top tier AAA gaming studios do just fine as businesses despite letting down fans game after game with oversell and under deliver, but suffer as brands nobody will have good thing to say about Blizzard or EA or any other major studio.

Apple monetizes its brand very well by being able to price their products at premium compared to peers that will be at risk if users feel letdown.

[1] Perhaps new innovations will make radical improvements even in the near future, regardless that will not change Apple can ship in 2025 or even 2026 so still a letdown for users being promised things for last 2 years already.

Private compute cloud is apples solution. It doesn’t matter what specs your phone has because the inference is sent to a data center.

They literally have data centers worth of devices running inferences anonymously

> clearly getting value from it

Or clearly thinking they might get value from it. I personally agree they're likely getting value, but it's pretty easy to dupe otherwise smart people when handing them something with cabilities far outside their realm of expertise, so I'd caution against using a large user base as anything more than a suggestive signal when determining whether people are "clearly getting value."

For an example from a different domain, consider a lot of generic market-timing stock investment advice. It's pretty easy to sell predictions where you're right a significant fraction of the time, but the usual tradeoff is that the magnitude of your errors is much greater than the magnitude of your successes. Users can be easily persuaded that your advice is worth it because of your high success rate, but it's not possible for them to actually get any net value from the product.

Even beginning data scientists get caught in that sort of trap in their first forays into the markets [0], and people always have a hard time computing net value from products with a high proportion of small upsides and a small proportion of huge downsides [1].

It's kind of like the various philosophical arguments about micro murders. 10 murders per year is huge in a town of 40k people, but nobody bats an eye at 10 extra pedestrian deaths per year from routinely driving 35+ in a 25. Interestingly, even if that level of speeding actually saves you the maximal amount of time (rarely the case for most commutes, where light cycles and whatnot drastically reduce the average speedup from "ordinary" reckless driving), you'll on average cause more minutes of lost life from the average number of deaths you'll cause than you'll save from the speeding. It's a net negative behavior for society as a whole, but almost nobody is inclined to even try to think about it that way, and the immediate benefit of seemingly saving a few minutes outweighs the small risk of catastrophic harm. Similarly with rolling through stop signs (both from the immediate danger, and from the habit you're developing that makes you less likely to be able to successfully stop in the instances you actually intend to).

[0] Not a source, those are a dime a dozen if you want to see a DS lose a lot of money, but XKCD is always enjoyable: https://xkcd.com/1570/

[1] Also not a source, just another great XKCD: https://xkcd.com/937/

Do you also judge crack cocaine's value by its number of users?

I don't think most people are capable of doing a cost/benefit ratio calculation on how what they do affects the rest of the world, and the wealthy are far and away the worst abusers of this sadass truth.