Comment by bottlepalm

1 day ago

What use are weights without the hardware to run them? That's the gate. Local AI right now is a toy in comparison.

Nukes are actually a great example of something also gated by resources. Just having the knowledge/plans isn't good enough.

13 comments

bottlepalm

txrx0000 1 day ago

Scaling has hit a wall and will not get us to AGI. Open-source models are only a couple of months behind closed models, and the same level of capability will require smaller and smaller models in the future. This is where open research can help: make the models smaller ASAP. I think it's likely that we'll be able to get something human-level to run on a single 16GB GPU before the end of the decade.

Tade0 7 hours ago

> Scaling has hit a wall and will not get us to AGI.
That was never the aim. LLMs are not designed to be generally intelligent, just to be really good at producing believable text.
tbrownaw 1 day ago
> human-level to run on a single 16GB GPU before the end of the decade.
That's apparently about 6k books' worth of data.
- txrx0000 1 day ago
  
  For the weights and temporary state, yes. It doesn't sound like a lot until you remember that your DNA is about 600 books worth of data by the same metric.
- octoberfranklin 20 hours ago
  
  How many humans do you know who can recite 6000 books, word for word, exactly?
drdaeman 1 day ago

> Open-source models are only a couple of months behind closed models
Oh, come on, surely not just a couple months.
Benchmarks may boast some fancy numbers, but I just tried to save some money by trying out Qwen3-Next 80B and Qwen3.5 35B-A3B (since I've recently got a machine that can run those at a tolerable speed) to generate some documentation from a messy legacy codebase. It was nowhere close neither in the output quality nor in performance to any current models that the SaaS LLM behemoth corps offer. Just an anecdote, of course, but that's all I have.

fooker 1 day ago

> hardware to run them

Costs a few hundred thousand per server, it's a huge expense if you want it at your home but a rounding error for most organizations.

bottlepalm 1 day ago
You're buying what exactly for a few hundred thousand? and running what model on it? to support how many users? at what tps?
- fooker 1 day ago
  
  Not every use case is a cloud provider or tech giant.
  Newer Blackwell does 200+ tokens per second on the largest models and tens of thousands on the smaller models. Most military applications require fast smaller models, I'd imagine.
  Also, custom chips are reportedly approaching an order of magnitude more for the price. It's a matter of availability right now, but that will be solved at some point.

reactordev 1 day ago

I run local models on Mac studios and they are more than capable. Don’t spread fud.

bottlepalm 1 day ago
You're spreading fud. There's nothing you can run locally that's on par with the speed/intelligence of a SOTA model.
- 3836293648 1 day ago
  
  You may be correct about the level of models you can actually run on consumer hardware, but it's not fud and you're being needlessly aggressive here.
- CamperBob2 21 hours ago
  
  Incorrect as of a couple of days ago, when Qwen 3.5 came out. It's a GPT 5-class model that you can run at full strength on a small DGX Spark or Mac cluster, and it still works pretty well after quantization.