← Back to context

Comment by 01HNNWZ0MV43FF

6 days ago

Damn. Well I'll spend a few bucks trying it out and I'll ask my employer if they're okay with me using agents on company time, but

But I'm not thrilled about centralized, paid tools. I came into software during a huge FOSS boom. Like a huge do it yourself, host it yourself, Publish Own Site, Syndicate Elsewhere, all the power to all the people, borderline anarchist communist boom.

I don't want it to be like other industries where you have to buy a dog shit EMR and buy a dog shit CAD license and buy a dog shit tax prep license.

Maybe I lived through the whale fall and Moloch is catching us. I just don't like it. I rage against dying lights as a hobby.

Yeah, I'm ready to jump in, but I need an agent running on my hardware at home without internet access.

How far away are we from that? How many RYX 50s do I need?

This is a serious question btw.

  • It's unfortunate that AMD isn't in on the AI stuff, because they are releasing a 96GB card ($10k so it's pricey currently) which would drop the number you need.

  • I mean it depends on the model; some people running deepseek report they have better performance at home running on a CPU with lots of ram (think a few hundred gigabytes). Even when running locally vram is more relevant than the performance of the GPU. That said I'm really not the person to ask about this, as I don't have AI agents running amuck on my machine yet

You can self host an open-weights LLM. Some of the AI-powered IDEs are open source. It does take a little more work than just using VSCode + Copilot, but that's always been the case for FOSS.

  • An important note is that the models you can host at home (e.g. without buying ten(s of) thousand dollar rigs) won't be as effective as the proprietary models. A realistic size limit is around 32 billion parameters with quantisation, which will fit on a 24GB GPU or a sufficiently large MBP. These models are roughly on par with the original GPT-4 - that is, they will generate snippets, but they won't pull off the magic that Claude in an agentic IDE can do. (There's the recent Devstral model, but that requires a specific harness, so I haven't tested it.)

    DeepSeek-R1 is on par with frontier proprietary models, but requires a 8xH100 node to run efficiently. You can use extreme quantisation and CPU offloading to run it on an enthusiast build, but it will be closer to seconds-per-token territory.