Comment by mark_l_watson

1 month ago

> in 5-10 years, most entry-level laptops will be able to manage a local 30B sized model

I suspect most of the excitement and value will be on edge devices. Models sized 1.7B to 30B have improved incredibly in capability in just the last few months and are unrecognizably better than a year ago. With improved science, new efficiency hacks, and new ideas, I can’t even imagine what a 30B model with effective tooling available could do in a personal device in two years time.

3 comments

mark_l_watson

sigbottle 1 month ago

Very interested in this! I'm mainly a ChatGPT user; for me, o3 was the first sign of true "intelligence" (not 'sentience' or anything like that, just actual, genuine usefulness). Are these models at that level yet? Or are they o1? Still GPT4 level?

logicprog 1 month ago

Not nearly o3 level. Much better than GPT4, though! For instance Qwen 3 30b-a3b 2507 Reasoning gets 46 vs GPT 4's 21 and o3's 60-something on Artificial Analysis's benchmark aggregation score. Small local models ~30b params and below tend to benchmark far better than they actually work, too.

torginus 1 month ago

I think having massive amounts of high-bandwidth memory on consumer grade hardware could become a reality via flash.

How Flash in SSDs works is you have tens to hundreds of dies stacked on top of each other in the same package, and their outputs are multiplexed so that only one of them can talk at the same time.

We do it like this because we still can get 1-2 GB/s out of a chip this way, and having the ability to read hundreds of times faster is not justified for storage use.

But if we connected these chips to high speed transcievers, we could get out all the 100s of GBs of bandwidth at the same time.

I'm probably oversimplifying things, and it's not that simple IRL, but I'm sure people are already working on this (I didn't come up with the idea), and it might end up working out and turn into a commercial product.