Comment by 01HNNWZ0MV43FF
6 days ago
Damn. Well I'll spend a few bucks trying it out and I'll ask my employer if they're okay with me using agents on company time, but
But I'm not thrilled about centralized, paid tools. I came into software during a huge FOSS boom. Like a huge do it yourself, host it yourself, Publish Own Site, Syndicate Elsewhere, all the power to all the people, borderline anarchist communist boom.
I don't want it to be like other industries where you have to buy a dog shit EMR and buy a dog shit CAD license and buy a dog shit tax prep license.
Maybe I lived through the whale fall and Moloch is catching us. I just don't like it. I rage against dying lights as a hobby.
Yeah, I'm ready to jump in, but I need an agent running on my hardware at home without internet access.
How far away are we from that? How many RYX 50s do I need?
This is a serious question btw.
It's unfortunate that AMD isn't in on the AI stuff, because they are releasing a 96GB card ($10k so it's pricey currently) which would drop the number you need.
I mean it depends on the model; some people running deepseek report they have better performance at home running on a CPU with lots of ram (think a few hundred gigabytes). Even when running locally vram is more relevant than the performance of the GPU. That said I'm really not the person to ask about this, as I don't have AI agents running amuck on my machine yet
You can self host an open-weights LLM. Some of the AI-powered IDEs are open source. It does take a little more work than just using VSCode + Copilot, but that's always been the case for FOSS.
An important note is that the models you can host at home (e.g. without buying ten(s of) thousand dollar rigs) won't be as effective as the proprietary models. A realistic size limit is around 32 billion parameters with quantisation, which will fit on a 24GB GPU or a sufficiently large MBP. These models are roughly on par with the original GPT-4 - that is, they will generate snippets, but they won't pull off the magic that Claude in an agentic IDE can do. (There's the recent Devstral model, but that requires a specific harness, so I haven't tested it.)
DeepSeek-R1 is on par with frontier proprietary models, but requires a 8xH100 node to run efficiently. You can use extreme quantisation and CPU offloading to run it on an enthusiast build, but it will be closer to seconds-per-token territory.