Comment by clvx
7 days ago
In a related subject, what’s the best hardware to run local LLM’s for this use case? Assuming a budget of no more of $2.5K.
And, is there an open source implementation of an agentic workflow (search tools and others) to use it with local LLM’s?
You can get used RTX 3090 for $750-800 each. Pro tip; look for 2.5 slot sized models line EVGA XC3 or the older blower models. Then you can get two for $1600, fit them in a full size case, 128GB DDR5 for $300, some Ryzen CPU like the 9900X and a mobo and case and PSU to fill up the rest of the budget. If you want to skimp you can drop one of the GPUs until you're sure you need 48GB VRAM and some of the RAM but you really don't save that much. Just make sure you get a case that can fit multiple full size GPU and a mobo that can support it as well. The slot configurations are pretty bad on the AM5 generation for multi GPU. You'll probably end up with a mobo such as Asus ProArt
Also none of this is worth the money because it's simply not possible to run the same kinds of models you pay for online on a standard home system. Things like ChatGPT 4o use more VRAM than you'll ever be able to scrounge up unless your budget is closer to $10,000-25,000+. Think multiple RTX A6000 cards or similar. So ultimately you're better off just paying for the online hosted services
I think this proves one of the suckpoints of AI : there are clearly certain things that the smaller models should be fine at... but there doesn't seem to be frameworks or something that constantly analyze and simulate and evaluate what you could be doing with smaller and cheaper models
Of course the economics are completely at odds with any real engineering: nobody wants you to use smaller local models, nobody wants you to consider cost/efficiency saving
> but there doesn't seem to be frameworks or something that constantly analyze and simulate and evaluate what you could be doing with smaller and cheaper models
This is more of a social problem. Read through r/LocalLlama every so often and you'll see how people are optimizing their usage.
I've wondered about this also. I have an MBA and like that it's lightweight and relatively cheap. I could buy a MBP and max out the RAM, but I think getting a Mac mini with lots of RAM could actually make more sense. Has anyone set up something like this to make it available to their laptop/iPhone/etc.?
Seems like there would be cost advantages and always-online advantages. And the risk of a desktop computer getting damaged/stolen is much lower than for laptops.
I'm using Zed which supports Ollama on my M4 Macs.
https://zed.dev/blog/fastest-ai-code-editor
You can build a pretty good PC with a used 3090 for that budget. It will outperform anything else in terms of speed. Otherwise, you can get something like an m4 pro mac with 48gb ram.
I got a M3 max (the higher end one) with 64GB/ram macbook pro a while back for $3k, might be cheaper now now that the M3 ultra is out.