Local AI is driving the biggest change in laptops in decades

2 months ago (spectrum.ieee.org)

272 comments

barqawiz

I was in the market for a laptop this month. Many new laptops now advertise AI features like this "HP OmniBook 5 Next Gen AI PC" which advertises:

"SNAPDRAGON X PLUS PROCESSOR - Achieve more everyday with responsive performance for seamless multitasking with AI tools that enhance productivity and connectivity while providing long battery life"

I don't want this garbage on my laptop, especially when its running of its battery! Running AI on your laptop is like playing Starcraft Remastered on the Xbox or Factorio on your steamdeck. I hear you can play DOOM on a pregnancy test too. Sure, you can, but its just going to be a tedious inferior experiance.

Really, this is just a fine example of how overhyped AI is right now.

Legend2440 2 months ago
Laptop manufacturers are too desperate to cash on the AI craze. There's nothing special about an 'AI PC'. It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.
>I don't want this garbage on my laptop, especially when its running of its battery!
The one bit of good news is it's not going to impact your battery life because it doesn't do any on-device processing. It's just calling an LLM in the cloud.
- 14113 1 month ago
  
  That's not quite correct. Snapdragon chips that are advertised as being good for "AI" also come with the Hexagon DSP, which is now used for (or targeted at) AI applications. It's essentially a separate vector processor with large vector sizes.
- marcus_holmes 2 months ago
  
  Doesn't this lead to a lot of tension between the hardware makers and Microsoft?
  MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.
  Laptop manufacturers are making laptops that can run an LLM locally, but there's no point in that unless there's a local LLM to run (and Windows won't have that because Copilot). Are they going to be pre-installing Llama on new laptops?
  Are we going to see a new power user / normal user split? Where power users buy laptops with LLMs installed, that can run them, and normal folks buy something that can call Copilot?
  Any ideas?
  
  7 replies →
- zamadatix 2 months ago
  
  > It's just a regular PC with Windows Copilot... which is a standard Windows feature anyway.
  "AI PC" branded devices get "Copilot+" and additional crap that comes with that due to the NPU. Despite desktops having GPUs with up to 50x more TOPs than the requirement, they don't get all that for some reason https://www.thurrott.com/mobile/copilot-pc/323616/microsoft-...
  
  1 reply →
- bitwize 2 months ago
  
  AI PCs also have NPUs which I guess provide accelerated matmuls, albeit less accelerated than a good discrete GPU.
- eleventyseven 1 month ago
  
  There's nothing special with what Intel has lowered the bar as an AI PC so vendors can market it. Ollama can run a 4b model plenty fine on Tiger Lake with 8gb classic RAM.
  But unified memory IS truly what makes an AI ready PC. The Apple Silicon proves that. People are willing to pay the premium, and I suspect unified memory will still be around and bringing us benefits even if no one cares about LLMs in 5 years.
- autoexec 2 months ago
  
  Even collecting and sending all that data to the cloud is going to drain battery life. I'd really rather my devices only do what I ask them to than have AI running the background all the time trying to be helpful or just silently collecting data.
  
  13 replies →
neves 1 month ago
I have a Snapdragon laptop and it is the best I've ever had. But the NPU is really almost useless.
This is a nice companion to the article: https://www.pcworld.com/article/2965927/the-great-npu-failur...
- dijit 1 month ago
  
  Agreed, I have the ARM based T14s for work.
  The thing is nowhere near the performance as a macbook, but its silent and the battery lasts ages, which is a far cry from the same laptop with an Intel CPU, which is what many are running.
  Company removes a lot of the AI bloat though.
dpedu 1 month ago

> Running AI on your laptop is like playing Starcraft Remastered on the Xbox
A great analogy because there is Starcraft for a console - Nintendo 64 - and it is quite awkward. Split-screen multiplayer included.
layer8 1 month ago

It’s true that the AI marketing is largely nonsense, but the NPUs also don’t hurt, and you don’t have to make use of them.
pluralmonad 1 month ago

Factorio runs really well on the deck though...
But yeah, fresh install of OS is a must for any new computer.

jwr 1 month ago

The author seems unaware of how well recent Apple laptops run LLMs. This is puzzling and puts into question the validity of anything in this article.

gcanyon 1 month ago
If Apple offered a reasonably-priced laptop with more than 24gb of memory (I'm writing this on a maxed-out Air) I'd agree. I've been buying Apple laptops for a long time, and buying the maximum memory every time. I just checked, and I see that now you can get 32gb. But to get 64gb I think you have to spend $3700 for the MBMax, and 128gb starts at $4500, almost 3x the 32gb Air's price.
And as far as I understand it, an Air with an M3 is perfectly capable of running larger models (albeit slower) if it had the memory.
- mft_ 1 month ago
  
  You’re not wrong that Apple’s memory prices are unpleasant, but also consider the competition - in this context (running LLMs locally) laptops with large amounts of fast memory that can be purposed for the GPU. This limits you to Apple or one specific AMD processor at present.
  An HP Zbook with an AMD 395+ and 128Gb of memory apparently lists for $4049 [0]
  An ASUS ROG Flow z13 with the same spec sells for $2799 [1] - so cheaper than Apple, but still a high price for a laptop.
  [0] https://hothardware.com/reviews/hp-zbook-ultra-g1a-128gb-rev...
  [1] https://www.hidevolution.com/asus-rog-flow-z13-gz302ea-xs99-...
  
  7 replies →
- jdprgm 1 month ago
  
  The trick here is buying used. Especially for something like the m1 series there is tremendous value to be had on high memory models where the memory hasn't changed significantly over generations compared the cpus and even m1's are quite competent for many workloads. Got a m1 max 64gb ram recently for I think $1400.
- jwr 1 month ago
  
  I think pricing is just one dimension of this discussion — but let's dive into it. I agree it's a lot of money. But what are you comparing this pricing to?
  From what I understand, getting a non-Apple solution to the problem of running LLMs in 64GB of VRAM or more has a price tag that is at least double of what you mentioned, and likely has another digit in front if you want to get to 128GB?
- fnord77 1 month ago
  
  it's astonishing how apple gouges on the memory and ssd upgrade prices (I'm on an M1 w/ 64Gb/4Tb).
  That said they have some elasticity when it comes to the DRAM shortage.
  
  2 replies →
fancyfredbot 1 month ago
I think the author is aware of Apple silicon. The article mentions the fact Apple has unified memory and that this is advantageous for running LLMs.
- dangus 1 month ago
  
  Then idk why they say that most laptops are bad at running LLMs, Apple has a huge marketshare in the laptop market and even their cheapest laptops are capable in that realm. And their PC competitors are more likely to be generously specced out in terms of included memory.
  > However, for the average laptop that’s over a year old, the number of useful AI models you can run locally on your PC is close to zero.
  This straight up isn’t true.
  
  25 replies →
whazor 1 month ago
But economically, it is still much better to buy a lower spec't laptop and to pay a monthly subscription for AI.
However, I agree with the article that people will run big LLMs on their laptop N years down the line. Especially if hardware outgrows best-in-class LLM model requirements. If a phone could run a 512GB LLM model fast, you would want it.
- m4rtink 1 month ago
  
  Are you sure the subscription will still be affordable after the venture capital flood ends and the dumping stops?
  
  15 replies →
- seanmcdirmid 1 month ago
  
  Running an LLM locally means you never have to worry about how many tokens you've used, and also it allows for a lot of low latency interactions on smaller models that can run quickly.
  I don't see why consumer hardware won't evolve to run more LLMs locally. It is a nice goal to strive for, which consumer hardware makers have been missing for a decade now. It is definitely achievable, especially if you just care about inference.
  
  2 replies →
- ignoramous 1 month ago
  
  > economically, it is still much better to buy a lower spec't laptop and to pay a monthly subscription for AI
  Uber is economical, too; but folks prefer to own cars, sometimes multiple.
  And how there's market for all kinds of vanity cars, fast sportscars, expensive supercars... I imagine PCs & Laptops will have such a market, too: In probably less than a decade, may be a £20k laptop running a 671b+ LLM locally will be the norm among pros.
  
  11 replies →
- NooneAtAll3 1 month ago
  
  any "it's cheaper to rent than to own" arguments can be (and must be) completely disregarded due to experience of the last decade
  so stop it
azuanrb 1 month ago
You still need ridiculously high spec hardware, and at Apple’s prices, that isn’t cheap. Even if you can afford it (most won't), the local models you can run are still limited and they still underperform. It’s much cheaper to pay for a cloud solution and get significantly better result. In my opinion, the article is right. We need a better way to run LLMs locally.
- onion2k 1 month ago
  
  You still need ridiculously high spec hardware, and at Apple’s prices, that isn’t cheap.
  You can easily run models like Mistral and Stable Diffusion in Ollama and Draw Things, and you can run newer models like Devstral (the MLX version) and Z Image Turbo with a little effort using LM Studio and Comfyui. It isn't as fast as using a good nVidia GPU or a cloud GPU but it's certainly good enough to play around with and learn more about it. I've written a bunch of apps that give me a browser UI talking to an API that's provided by an app running a model locally and it works perfectly well. I did that on an 8GB M1 for 18 months and then upgraded to a 24GB M4 Pro recently. I still have the M1 on my network for doing AI things in the background.
  
  1 reply →
- jki275 1 month ago
  
  I bought my M1 Max w/ 64gb of ram used. It's not that expensive.
  Yes, the models it can run do not perform like chatgpt or claude 4.5, but they're still very useful.
  
  2 replies →
- almosthere 1 month ago
  
  749 for an M4 air at Amazon right now
  
  8 replies →
- whitehexagon 1 month ago
  
  I was pleasantly surprised at the speed and power of my second hand M1 Pro 32GB running Asahi & Qwen3:32B. It does all I need, and I dont mind the reading pace output, although I'd be tempted by M2 Ultra if the secondhand market hadn't also exploded with the recent RAM market manipulations.
  Anyway, I'm on a mission to have no subscriptions in the New Year. Plus it feels wrong to be contributing towards my own irrelevance (GAI).
dangus 1 month ago

Yeah, any Mac system specced with a decent amount of RAM since the M1 will run LLMs locally very well. And that’s exactly how the built-in Apple Intelligence service works: when enabled, it downloads a smallish local model. Since all Macs since the M1 have very fast memory available to the integrated GPU, they’re very good at AI.
The article kinda sucks at explaining how NPUs aren’t really even needed, they just have potential to make things more efficient in the future rather than depending on the power consumption involved with running your GPU.
terafo 1 month ago

This article specifically talks about PC laptops and discusses changes in them.
cmxch 1 month ago
Only if you want to take all the proprietary baggage and telemetry that comes with Apple platforms by default.
A Lenovo T15g with a 16gb 3080 mobile doesn’t do too badly and will run more than just Windows.
- pimeys 1 month ago
  
  I just got a Framework desktop with 128 GB of shared RAM just before the memory prices rocketed, and I can comfortably run many even bigger oss models locally. You can dedicate 112GB to the GPU and it runs Linux perfectly.
selinkocalar 1 month ago

The M-series chips really changed the game here
reactordev 1 month ago

This article is to sell more laptops.

seunosewa 1 month ago

"How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly."

What's he talking about? It's trivial to calculate that.

RobotToaster 1 month ago
Isn't the ability to run it more dependant on (V)RAM? With TOPS just dictating the speed at which it runs?
- zozbot234 1 month ago
  
  Strictly speaking, you don't need that much VRAM or even plain old RAM - just enough to store your context and model activations. It's just that as you run with less and less (V)RAM you'll start to bottleneck on things like SSD transfer bandwidth and your inference speed goes down to a crawl. But even that may or may not be an issue depending on your exact requirements: perhaps you don't need your answer instantly and can wait while it gets computed in the background. Or maybe you're running with the latest PCIe 5 storage which overall gives you comparable bandwidth to something like DDR3/DDR4 memory.
- NitpickLawyer 1 month ago
  
  A good rule of thumb is that PP (Prompt Processing) is compute bound while TG (Token Generation) is (V)RAM speed bound.
fny 1 month ago

It's also been done before...[0]
[0]: https://www.edge-ai-vision.com/2024/05/2024-edge-ai-and-visi...
cramcgrab 1 month ago

It’s trivial to ask an AI to answer that. Well, I guess we know it’s not an AI generated article!
swyx 1 month ago

> state-of-the-art models
> hundreds of millions of parameters
lol
lmao, even

mattas 1 month ago

See: "3D TVs are driving the biggest change in TVs in decades"

eleventyseven 1 month ago
A lazy easy cheap shot. But do you deny these aspects from the article are not coming? Or won't be still here in 5 years?
- Addition of more—and faster—memory.
- Consolidation of memory.
- Combination of chips on the same silicon.
All of these are also happening for non AI reasons. The move to SoC that really started with the M1 wasn't because of AI, but unified memory being the default is something we will see in 5 years. Unlike 3D TV.
- technion 1 month ago
  
  We just had a series of articles and sysadmin outcry that major vendors were bringing 8gb laptops back to standard models because of the ram prices. In the short term, we're seeing a reduction.
  
  1 reply →
- estimator7292 1 month ago
  
  Memory is absolutely not coming in the near future. Nobody can afford it.
- MisterTea 1 month ago
  
  > The move to SoC that really started with the M1
  No it did not. There were numerous SoC that came before it and was inevitable in this space.
  
  2 replies →
- blibble 1 month ago
  
  > Addition of more—and faster—memory.
  probably not after scam altman bought up half the world's supply for his shit company
- ToucanLoucan 1 month ago
  
  In order:
  - People wanting more memory is not a novel feature. I am excited to find out how many people immediately want to disable the AI nonsense to free up memory for things they actually want to do.
  - Same answer.
  - I think the drive towards SOCs has been happening already. Apple's M-series utterly demolishes every PC chip apart from the absolute bleeding-edge available, includes dedicated memory and processors for ML tasks, and it's mature technology. Been there for years. To the extent PC makers are chasing this, I would say it's far more in response to that than anything to do with AI.
- heavyset_go 1 month ago
  
  The move to SoC happened long before the M1, it was the state of things in the ARM space for over a decade, and most x86 laptops have been SoCs for quite some time.
m4rtink 1 month ago

Blockchain is making money obsolete.
j45 1 month ago
This article is just saying more laptops will have power efficient GPUs in it. A bit better than 3D TVs.
They might not use Apple silicon often. Other options are encouraging.
NedF 1 month ago

[dead]

tengbretson 1 month ago

Outside of Apple laptops (and arguably the Ryzen AI MAX 390), an "AI ready" laptop is simply marketing speak for "is capable of making HTTP requests."

tracerbulletx 1 month ago

This mostly just shows you how far behind the M1 (which came out 5 years ago) all the non Apple laptops are.

properbrew 1 month ago
Was never really into Apple hardware (mainly the price), however I recently got an M1 Mac Mini and an iPhone for app development, and the inference speed for as you say, a 5 year old chip is actually crazy.
If they made the M series fully open for Linux (I know Asahi is working away) I probably would never buy another non-M series processor again.
- dpedu 1 month ago
  
  I got an M1 Mac Mini somewhat recently as well, to replace my ~2012 Mac Mini that I use as a media center PC. And frankly, it's overkill. Used ones can be had for $200-$300 USD, lower side with cosmetic damage. An absolute steal, IMO.
  
  1 reply →
jeffbee 1 month ago
You can still get an M1 Macbook Air at retail for $599 ($300 for refurbs), which is a Chromebook price for a laptop that is better in pretty much every respect than any Chromebook.
- nyarlathotep_ 1 month ago
  
  https://slickdeals.net/f/19004236-select-micro-center-stores...
  MicroCenter has(had? OOS near me) M4 Minis for $400!
  A remarkable bargain, even more so considering the recent hardware price hikes.
- heavyset_go 1 month ago
  
  If you're going for refurbs, you can get a device with an AMD 7000/8000/9000 APU, at the same or lower price point, and the iGPU itself will perform better than an M1 for prompt processing and generation, even with SODIMM memory.

aappleby 2 months ago

I predict we will see compute-in-flash before we see cheap laptops with 128+ gigs of ram.

14113 1 month ago

There was a company that did compute-in-dram, which was recently acquired by Qualcomm: https://www.emergentmind.com/topics/upmem-pim-system
znpy 2 months ago
You could get 128gb ram laptops from the time ddr4 came around: workstation class laptops with 4 ram slots would happily take 128gb of memory.
The fact that nowadays there are little to no laptops with 4 ran slots is entirely artificial.
- mhitza 1 month ago
  
  I was mussing this summer if I should get a refurbed Thinkpad P16 with 96GB of RAM to run VMs purely in memory. Now that 96GB of ram cost as much as a second P16.
  
  3 replies →
zamadatix 2 months ago

I can't tell if this is optimism for compute-in-flash or pessimism with how RAM has been going lately!
ajb 1 month ago

The thing that is supposed to happen next is high-bandwidth flash. In theory, it could allow laptops to run the larger models without being extortionately costly, by loading directly from flash into the GPU (not by executing in flash) But I haven't seen figures of the actual bandwidth yet, and no doubt to start with it will be expensive. The underlying technology of flash has much higher read latency than dram, so it's not really clear (to me, at least) if they can deliver the speeds needed to remove the need to cache in VRAM just by increasing parallelism.
p1esk 2 months ago

We’ve had “compute in flash” for a few years now: https://mythic.ai/product/
wkat4242 2 months ago
Yeah especially since what is happening in the memory market
- noosphr 2 months ago
  
  Feast and famine.
  In three years we will be swimming in more ram than we know what to do with.
  
  4 replies →
aitchnyu 2 months ago
Memristors are (IME) missing from the news. They promised to act as both persistent storage and fast RAM.
- ACCount37 1 month ago
  
  If only memristors weren't vaporware that has "shown promise" for 3 decades now and went nowhere.
112233 2 months ago
By "we" do you mean consumers? No, "we" will get neither. This is unexpected, irresistable opportunity to create a new class, by controlling the technology that people are required and are desiring to use (large genAI) with a comprehensive moat — financial, legislative and technological. Why make affordable devices that enable at least partial autonomy? Of course the focus will be on better remote operation (networking, on-device secure computation, advancing narrative that equates local computation with extremism and sociopathy).
- cmxch 1 month ago
  
  Push Washington to grill the foundries and their customers. Repeat until prices drop.

socketcluster 2 months ago

I feel like there's no point to get a graphics card nowadays. Clearly, graphics cards are optimized for graphics; they just happened to be good for AI but based on the increased significance of AI, I'd be surprised if we don't get more specialized chips and specialized machines just for LLMs. One for LLMs, a different one for stable diffusion.

With graphics processing, you need a lot of bandwidth to get stuff in and out of the graphics card for rendering on a high-resolution screen, lots of pixels, lots of refreshes, lots of bandwidth... With LLMs, a relatively small amount of text goes in and a relatively small amount of text comes out over a reasonably long amount of time. The amount of internal processing is huge relative to the size of input and output. I think NVIDIA and a few other companies already started going down that route.

But probably graphics cards will still be useful for stable diffusion; especially AI-generated videos as the inputs and output bandwidth is much higher.

ACCount37 1 month ago
Nah, that's just plain wrong.
First, GPGPU is powerful and flexible. You can make an "AI-specific accelerator", but it wouldn't be much simpler or much more power-efficient - while being a lot less flexible. And since you need to run traditional graphics and AI workloads both in consumer hardware? It makes sense to run both on the same hardware.
And bandwidth? GPUs are notorious for not being bandwidth starved. 4K@60FPS seems like a lot of data to push in or out, but it's nothing compared to how fast modern PCIe 5.0 x16 goes. AI accelerators are more of the same.
- djsjajah 1 month ago
  
  GPUs might not be bandwidth starved most of the time, but they absolutely are when generating text from an llm. It’s the whole reason why low precision floating point numbers are being pushed by nvidia.
  
  1 reply →
Legend2440 2 months ago
LLMs are enormously bandwidth hungry. You have to shuffle your 800GB neural network in and out of memory for every token, which can take more time/energy than actually doing the matrix multiplies. GPUs are almost not high bandwidth enough.
- socketcluster 2 months ago
  
  But even so, for a single user, the output rate for a very fast LLM would be like 100 tokens per second. With graphics, we're talking like 2 million pixels, 60 times a second; 120 million pixels per second for a standard high res screen. Big difference between 100 tokens vs 120 million pixels.
  24 bit pixels gives 16 million possible colors... For tokens, it's probably enough to represent every word of the entire vocabulary of every major national language on earth combined.
  > You have to shuffle your 800GB neural network in and out of memory
  Do you really though? That seems more like a constraint imposed by graphics cards. A specialized AI chip would just keep the weights and all parameters in memory/hardware right where they are and update them in-situ. It seems a lot more efficient.
  I think that it's because graphics cards have such high bandwidth that people decided to use this approach but it seems suboptimal.
  But if we want to be optimal; then ideally, only the inputs and outputs would need to move in and out of the chip. This shuffling should be seen as an inefficiency; a tradeoff to get a certain kind of flexibility in the software stack... But you waste a huge amount of CPU cycles moving data between RAM, CPU cache and Graphics card memory.
  
  2 replies →
- Zambyte 2 months ago
  
  This doesn't seem right. Where is it shuffling to and from? My drives aren't fast enough to load the model every token that fast, and I don't have enough system memory to unload models to.
  
  11 replies →
zamadatix 2 months ago

> Clearly, graphics cards are optimized for graphics; they just happened to be good for AI
I feel like the reverse has been true since after the Pascal era.
autoexec 2 months ago

I don't doubt that there will be specialized chips that make AI easier, but they'll be more expensive than the graphics cards sold to consumers which means that a lot of companies will just go with graphics cards, either because the extra speed of specialized chips won't be worth the cost, or will they'll be flat out too expensive and priced for the small number of massive spenders who'll shell out insane amounts of money for any/every advantage (whatever they think that means) they can get over everyone else.

Groxx 1 month ago

re NPUs: they've been a marketing thing for years now, but I really have no idea how many of them are actually used when you run [whatever]. particularly after a year or two of software updates.

anyone have numbers? are they just an added expense that is supported for first party stuff for 6 months before they need a bigger model, or do they have staying power? clearly they are capable of being used to save power, but does anything do that in practice, in consumer hardware?

spullara 2 months ago

I'm running GPT-OSS 120B on a MacBook Pro M3 Max w/128 GB. It is pretty good, not great, but better than nothing when the wifi on the plane basically doesn't work.

scotty79 1 month ago

I'm running it on PC laptop with mobile 5090 and 64GB of ram. Start is a bit rough, but once it gets going it is perfectly servicable when I'm on a bad connection.

seanmcdirmid 2 months ago

I’ve been running LLMs on my laptop (M3 Max 64GB) for a year now and I think they are ready, especially with how good mid sized models are getting. I’m pretty sure unified memory and energy efficient GPUs will be more than just a thing on Apple laptops in the next few years.

noman-land 1 month ago
You doing code completion and agentic stuff successfully with local models? Got any tips? I've been out of the game for [checks watch] a few months and am behind on the latest. Is Cline the move?
- seanmcdirmid 1 month ago
  
  I haven't bothered doing code completion locally yet, though its something I want to try with the QWEN model. I'm mostly using it to generate/fix code CLI style.
  
  1 reply →
allovertheworld 2 months ago
Only because of Apples unified memory architecture. The groundwork is there, we just need memory to be cheaper so we can fit 512+GB now ;)
- seanmcdirmid 2 months ago
  
  Memory prices will rise short term and generally fall long term, even with the current supply hiccup the answer is to just build out more capacity (which will happen if there is healthy competition). I meant, I expect the other mobile chip providers to adopt unified architecture and beefy GPU cores on chip and lots of bandwidth to connect it to memory (at the max or ultra level, at least), I think AMD is already doing UM at least?
  
  7 replies →
- zmmmmm 1 month ago
  
  There's not in the end all that much point having more memory than you can compute on in a reasonable time. So I think probably the useful amount tops out in the 128GB range where you can still run a 70b model and get a useful token rate out of it.

wkat4242 2 months ago

This article is so dumb. It totally ignores the memory price explosion that will make large fast memory laptops unfeasible for years and states stuff like this:

> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly. It’s not possible to run these models on today’s consumer hardware, so real-world tests just can’t be done.

We know exactly the performance needed for a given responsiveness. TOPS is just a measurement independent from the type of hardware it runs on..

The less TOPS the slower the model runs so the user experience suffers. Memory bandwidth and latency plays a huge role too. And context, increase context and the LLM becomes much slower.

We don't need to wait for consumer hardware until we know much much is needed. We can calculate that for given situations.

It also pretends small models are not useful at all.

I think the massive cloud investments will put pressure away from local AI unfortunately. That trend makes local memory expensive and all those cloud billions have to be made back so all the vendors are pushing for their cloud subscriptions. I'm sure some functions will be local but the brunt of it will be cloud, sadly.

dcreater 1 month ago

Horrible article. Low effort, low knowledge. Had no idea the bar was so low for an IEEE publication
vegabook 2 months ago
also, state of the art models have hundreds of _billions_ of parameters.
- omneity 2 months ago
  
  It tells you about their ambitions..
layer8 1 month ago

The article is from mid-November (and probably was written even earlier), where the RAM price explosion wasn’t as striking yet.

juancn 1 month ago

The price of RAM is going to throw a wrench at that

kristianp 1 month ago

"Local AI" could be many different things. NPUs are too puny to run many recent models, such as image generation and llms. The article seems to gloss over many important details like this, for example the creative agency, what AI work are they doing?

> marketing firm Aigency Amsterdam, told me earlier this year that although she prefers macOS, her agency doesn’t use Mac computers for AI work.

openquery 1 month ago

For 99% of people I don't see the usecase (except for privacy but that ship sailed a decade ago for the aforementioned 99%). If the argument is inference offline - the modern computing experience is basically all done through the browser anyway so I don't buy it.

GPUs for video games where you need low latency makes sense. Nvidia GeForce Now works but not for any serious gaming. But when it comes to LLMs at least, the 100ms latency between you and the Gemini API or whichever provider you use is negligible compared to the inference time.

What am I missing?

mginszt 1 month ago

I'm sure giants like Microsoft would like to add more AI capabilities, and I'm also sure they would like to avoid running them on their own servers.
Another thing is that I wouldn’t expect LLMs to be free forever. One day, CEOs will decide that everyone has become accustomed to them - and that will be the first day of a subscription-based model and the last day of AI companies reporting financial losses.

bfrog 2 months ago

I suppose it depends on the model, code was useless. As a lossy copy of an interactive Wikipedia it could be ok not good or great just ok.

Maybe for creative suggestions and editing it’d be ok.

fwipsy 2 months ago

Seems like wishful thinking.

> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly.

Why not extrapolate from open-source AIs which are available? The most powerful open-source AI (which I know of) is Kimi K2 and >600gb. Running this at acceptable speed requires 600+gb GPU/NPU memory. Even $2000-3000 AI-focused PCs like the DGX spark or Strix Halo typically top out at 128gb. Frontier models will only run on something that costs many times a typical consumer PC, and only going to get worse with RAM pricing.

In 2010 the typical consumer PC had 2-4gb of RAM. Now the typical PC has 12-16gb. This suggests RAM size doubling perhaps every 5 years at best. If that's the case, we're 25-30 years away from the typical PC having enough RAM to run Kimi K2.

But the typical user will never need that much RAM for basic web browsing, etc. The typical computer RAM size is not going to keep growing indefinitely.

What about cheaper models? It may be possible to run a "good enough" model on consumer hardware eventually. But I suspect that for at least 10-15 years, typical consumers (HN readers may not be typical!) will prefer capability, cheapness, and especially reliability (not making mistakes) over being able to run the model locally. (Yes AI datacenters are being subsidized by investors; but they will remain cheaper, even if that ends, due to economies of scale.)

The economics dictate that AI PCs are going to remain a niche product, similar to gaming PCs. Useful AI capability is just too expensive to add to every PC by default. It's like saying flying is so important, everyone should own an airplane. For at least a decade, likely two, it's just not cost-effective.

sipjca 2 months ago
> It may be possible to run a "good enough" model on consumer hardware eventually
10-15 years?!!!! What is the definition of good enough? Qwen3 8B or A30B are quite capable models which run on a lot of hardware even today. SOTA is not just getting bigger, it's also getting more intelligence and running it more efficiently. There have been massive gains in intelligence at the smaller model sizes. It is just highly task dependent. Arguably some of these models are "good enough" already, and the level of intelligence and instruction following is much better from even 1 year ago. Sure not Opus 4.5 level, but still much could be done without that level of intelligence.
- fwipsy 1 month ago
  
  "Good enough" has to mean users won't be frequently frustrated if they transition to it from a frontier model.
  > it is highly task dependent... much could be done without that level of intelligence
  This is an enthusiast's glass-half-full perspective, but casual end users are gonna have a glass-half-empty perspective. Quen3-8B is impressive, but how many people use it as a daily driver? Most casual users will toss it as soon as it screws up once or twice.
  The phrase you quoted in particular was imprecise (sorry) but my argument as a whole still stands. Replace "consumer hardware" with "typical PCs" - think $500 bestseller laptops from Walmart. AI PCs will remain niche luxury products, like gaming PCs. But gaming PCs benefit from being part of gaming culture and because cloud gaming adds input latency. Neither of these affects AI much.
  
  1 reply →
epicureanideal 2 months ago
You may be correct, but I wonder if we'll see Mac Mini sized external AI boxes that do have the 1TB of RAM and other hardware for running local models.
Maybe 100% of computer users wouldn't have one, but maybe 10-20% of power users would, including programmers who want to keep their personal code out of the training set, and so on.
I would not be surprised though if some consumer application made it desirable for each individual, or each family, to have local AI compute.
It's interesting to note that everyone owns their own computer, even though a personal computer sits idle half the day, and many personal computers hardly ever run at 80% of their CPU capacity. So the inefficiency of owning a personal AI server may not be as much of a barrier as it would seem.
- saltcured 1 month ago
  
  But will it ever lead to a Mac Mini-priced external AI box? Or will this always be a premium "pro" tier that seems to rival used car prices?
- seanmcdirmid 2 months ago
  
  > but I wonder if we'll see Mac Mini sized external AI boxes that do have the 1TB of RAM
  Isn't that the Mac Studio already? Ok, it seems to max at 512 GB.
marcus_holmes 2 months ago

> In 2010 the typical consumer PC had 2-4gb of RAM. Now the typical PC has 12-16gb. This suggests RAM size doubling perhaps every 5 years at best. If that's the case, we're 25-30 years away from the typical PC having enough RAM to run Kimi K2.
Part of the reason that RAM isn't growing faster is that there's no need for that much RAM at the moment. Technically you can put multiple TB of RAM in your machine, but no-one does that because it's a complete waste of money [0]. Unless you're working in a specialist field 16Gb of RAM is enough, and adding more doesn't make anything noticeably faster.
But given a decent use-case, like running an LLM locally, and you'd find demand for lots more RAM, and that would drive supply, and new technology developments, and in ten years it'll be normal to have 128TB of RAM in a baseline laptop.
Of course, that does require that there is a decent use-case for running an LLM locally, and your point that that is not necessarily true is well-made. I guess we'll find out.
[0] apart from a friend of mine working on crypto who had a desktop Linux box with 4TB of RAM in it.

TrackerFF 1 month ago

With the wild ram prices, which btw are probably going to last out 2026, I expect 8 GB ram to be the new standard going on forward.

32 GB ram will be for enthusiasts with deep pockets, and professionals. Anything over that, exclusively professionals.

The conspiracy theorist inside me is telling me that big AI companies like OpenAI would rather see that people are using their puny laptops as terminals / shells only, to reach sky-based models, than to let them have beefy laptops and local models.

cmxch 1 month ago

Not if a few investigations into the foundries and their datacenter deals stops that.
andy99 1 month ago
The conspiracy theorist inside me is telling me that big AI companies...
I don’t believe in conspiracies but I do believe in incentives sometimes lining up. Now that there is a RAM heavy cloud application, cloud providers are suddenly in direct competition with consumers for scarce resources, with the winner being able to control where people run their models.

rldjbpin 1 month ago

if you focus out of local LLMs (also served using dedicated apps), the title holds a lot of promise. case in point: WASM and WebGPU

the edge/on-device AI use cases on smartphones can also extend without user friction through web apps built on the above standards. perhaps one day there will be a "WebNPU" or just get supported through existing standards.

there are already some use cases on apps but it usually fallbacks on cpu. perhaps it could be the hw accelerated moment that we saw with video on the web.

meisel 1 month ago

I think only a small percentage of users care that much about running LLMs locally to pay for extra hardware for it, put up with slower and lower-quality responses, etc. . It’ll never be as good as non-local offerings, and is more hassle.

chnmig 1 month ago

The power and resource consumption of local large models are problems that laptops have to solve, and new versions of models are constantly being released, which means that laptop configurations will soon become outdated.

superkuh 1 month ago

The problem with this is that NPU have terrible, terrible support in the various software ecosystems because they are unique to their particular soc or whatever. No consistency even within particular companies.

rcarmo 1 month ago

Kind of ironic that it is a factor that 95% of regular users don’t care about or actively avoid.

j45 2 months ago

This must be referring mostly to windows, or non-Apple laptops

0xbadcafebee 1 month ago

Wirth's Law in action. Eventually it's going to take an entire datacenter to read the news.

ge96 1 month ago

Wonder if this relates to/overlaps those Coral Accelerator devices.

xrd 1 month ago

The takeaway from these comments are that you can really run local models if you use m-series devices from apple.

But, can you do that if you install Linux on that hardware?

I hate to admit apple hardware is incredible. But, I can't say the same about macos anymore.

Can I run Linux and reap the benefits of m-series chips with local inference?

Or, are there any alternatives where I can use llms on Linux on a laptop?

bad_haircut72 1 month ago

My recent shower thought was the idea that Moores law hasnt slowed at all, we just went multi-core. Its crazy that the intel folks were so interested in optimizing for single thread CPU design they completely misunderstood where the best effort would be spent - if I had been around back then (speaking as an Elixir dev) I would have been way more interested in having 500 theead CPUs than getting down to nanometer scale dies. Thats what you get when everyone on the team is a bunch of C programmers

ip26 1 month ago

Before LLMs, the use of parallelism on your typical laptop was limited to application level parallelism, e.g. one thread for Outlook and one for each tab in Chrome.
astrange 1 month ago

Intel designed a super high threaded CPU like that, Knightsbridge. It was useless. Single threaded programs are good.

tehjoker 1 month ago

I mean, having a more powerful laptop is great, but at the same time, these guys are calling for a >10x increase in RAM and a far more powerful NPU. How will this affect pricing? How will it affect power management? It made it seem like most of the laptop will be dedicated to gen AI services, which I'm still not entirely convinced are quite THAT useful. I still want a cheap laptop that lasts all day and I also want to be able to tap that device's full power for heavy compute jobs!

esses 2 months ago

I spent a good 30 seconds trying to figure out what DDS was an acronym for in this context.

lucb1e 1 month ago
Care to share the answer?
- esses 1 month ago
  
  Turns out the first word of the article was odds.

suprjami 1 month ago

Extremely cringe article.

The biggest thing to affect laptops in "decades" is solid state storage. No longer do you need to worry about killing your entire device simply by putting it down on a solid surface.

There are also plenty of other things like modern dense lithium ion batteries with 12+ hour runtimes, super power friendly CPUs of all architectures, the ultra-thin body and metal body popularised by Apple, LCD panels without ghosting, external power bricks instead of literally a PC power supply in a briefcase.

But yeah sure, the infinite slop plagiarism machine is coming. Gotta get some clicks!

zkmon 1 month ago

You don't understand the needs of a common laptop user. Define the usecases that require reaching out to laptop instead of using the phone that is nearby. Those usecases don't need LLM for a common laptop user.

darkreader 1 month ago

[dead]

gguncth 2 months ago

I have no desire to run an LLM on my laptop when I can run one on a computer the size of six football fields.

theshrike79 1 month ago
The point is that when you run it on your own hardware you can feed the model your health data, bank statements and private journals and can be 5000% sure they’re not going anywhere
- dboreham 1 month ago
  
  Regular people don't understand nor care about any of that. They'll happily take the Faustian bargain.
  
  1 reply →
sandworm101 2 months ago
I've been playing around with my own home-built AI server for a couple months now. It is so much better than using a cloud provider. It is the difference between drag racing in your own car, and renting one from a dealership. You are going to learn far more doing things yourself. Your tools will be much more consistent and you will walk away with a far greater understanding of every process.
A basic last-generation PC with something like a 3060ti (12GB) is more than enough to get started. My current rig pulls less than 500w with two cards (3060+5060). And, given the current temperature outside, the rig helps heat my home. So I am not contributing to global warming, water consumption, or any other datacenter-related environmental evil.
- DamonHD 1 month ago
  
  Unless you normally use electric resistance heating (or some kind of fossil fuel with higher gCO2/kWh) then you don't get necessarily a free pass on the global warming thing!
  Our whole home is heated with <500W on average: at this moment the heat pump is drawing 501W (H4 boundary) at close to freezing outside, and its demand is intermittent.
- HelloUsername 1 month ago
  
  > I am not contributing to global warming
  lol

gamblor956 1 month ago

The "AI laptop" boom is already fading. It turns out that LLMs, local or otherwise, just aren't very useful.

Like Big Data, LLMs are useful in a small niche of areas, like poorly summarizing meeting notes, or grammar check at a middle-school level.

On LLMs for coding tasks: I asked a programmer why they loved Claude and he showed me the output. Twenty years ago, that kind of code would have gotten someone PIP'd. Today it's considered better than most junior programmers...which is a sign of how far programming standards have fallen, and explains why most programs and apps are such buggy pieces of sh$t these days.