← Back to context

Comment by ryandrake

6 days ago

I'd like to know this, too. I'm just getting started getting my feet wet with ollama and local models using just CPU, and it's obviously terribly slow (even 24 cores, 128GB DRAM. It's hard to gauge how much GPU money I'd need to plonk down to get acceptable performance for coding workflows.

I tried to build a similar local stack recently to save on API costs. In practice I found the hardware savings are a bit of a mirage for coding workflows. The local models hallucinate just enough that you end up spending more in lost time debugging than you would have paid for Sonnet or Opus to get it right the first time.