← Back to context

Comment by ethbr1

10 hours ago

If 256GB of RAM enables them to run on-device AI models that (for reasons) are a key feature differentiator?

Personally, I think there's no way memory heavy inference moves on-device (vs cloud) due to the economics, but it's not impossible technology + platforms go that way for currently unforeseeable reasons.

I think there’s a realistic chance consumer inference moves on-device. I think it really depends on marketing.

My non-tech friends and family would probably be served perfectly fine by local models today, if they had a working web search tool. Their queries are often “soft” and don’t have an exact answer. My mom and aunt used it to pick a hairstyle, my mom used it to get an image of what a room would look like with particular drapes in it, etc. Stuff I think mid-sized local models like Gemma or smaller Qwens could do without issue. They just don’t have a device that will run them.

Businesses won’t move. They need a huge context so they can stuff a bunch of Confluence pages in it and 300 tools and it needs to read an entire codebase and yada yada. The hardware depreciation and electricity will probably make it a net zero or even cost more than paying for API access.

  • The economic argument in favor of cloud inference: higher utilization is always going to have a ROI for inference hardware.

    But maybe that hardware becomes so commoditized that it's not difficult to obtain / stuff in a box.

    • My argument is predicated on the assumption that mainstream hardware manufacturers will copy the way Apple and Framework have made system memory usable for inference.

      In that world, a) we are already at or close to having enough memory in local devices to do inference locally, and b) that memory isn't inference-specific and can be utilized for other things. Most devices come with enough memory to do some level of inference, and some come with plenty (eg a gaming desktop probably has 32GB+ of RAM in it).

      You aren't going to run Kimi on it, but I think the reality for a lot of consumer inference is that it doesn't need to be. It's going to be a lot of things that are soft, and easily answered by a search API, so the LLM really just needs to be able to skim and summarize. Going a step further, we may even see some kind of hybrid approach where a local OpenRouter kind of thing decides whether the task is soft enough to do locally with models that fit in RAM or if it needs to be farmed out to a PaaS provider.

Right. I’m not arguing that Apple wouldn’t offer a 256GB model if they could make money doing it; I’m puzzled as to why they wouldn’t offer several lower-spec models as the entry-level into and then progressive upgrades within that line, since only some people need that 256GB feature differentiator of running frontier-level models on their MacBook Pro.

  • And I'm saying, if 256GB of memory is a requirement for running customer-expected local models (and local models are preferred for some reason).

Think past on-device inference... imagine what on-device training could do. And that would need a lot of RAM.