Comment by arjie

9 hours ago

Not “local” and not interactive coding but sharing since it might be helpful. I have 2x RTX Pro 6000 Blackwell running DeepSeek V4 Flash. I get 160 tok/s raw but it’s a reasoning model. For my use case, I have it auto-write code and another system auto-review the code.

I occasionally use it with pi to write some code and it’s blazing fast but it’s mostly habit that keeps me with CC and Codex.

> I have 2x RTX Pro 6000 Blackwell

Where did you find/order these? All the sites I can find are either out of stock, only sell to businesses, or are otherwise sketchy...

  • Microcenter is the easiest place but almost any vendor will sell to you after you email them and if you have an LLC

  • I run a small business (https://technologybrother.com) that runs a few small SaaS so I ordered the GPUs through corporate sales. If the barrier is getting an LLC, those are relatively cheap. The nice thing is that if you've got a legitimate business with use for GPUs you can get into the Nvidia Inception Program which has a pretty solid discount.

Have you measured your electricity consumption for this rig? I have to wonder how much it would cost you per month.

  • Not nearly as much as you might think. 1.2kw where I live translates to about $0.12/hr, and that's when running full clip. If you have a decent solar hookup, it's small fraction on a sunny day.

    The expensive part is the upfront hardware cost and the electrical system upgrade you'll need to give your house.

    • I'm paying about $0.19/hr and using half that power just for a large spinning RAID, running some VMs and security cameras. And I'm reconsidering my digital extravagance because of the electric bill. You probably make way more money than I do.

  • Here's a DeepSeek-V4-Flash benchmark on 2X RTX Pro 6000:

      - Prefill: ~10K tok/s
      - Decode: 190 | 375 | 980 tok/s (for 1 | 4 | 16 concurrent requests)
      - GPU power draw during benchmark: Average: 585W | Max: 849W | Limit: 1200W with undervolt. Idle PC is 125W.
    

    I've asked it to calculate the following considering a realistic blend of cached prompts and decode for agentic dev scenario.

    Electricity-only (@ USD $0.08/kWh)

      Usage          | IN price  | OUT price | Monthly cost
      Concurrency=1  | $0.040/M  | $0.080/M  | $8.65 to $38.88 (5% to 100% active)
      Concurrency=4  | $0.024/M  | $0.044/M  | up to $48.67 (cheaper per token but higher power draw)
    

    Total cost of ownership over 3 years is electricity + USD $20K (pre-hike pricing). In a production scenario, how much would I have to charge my users to break even, aiming for 4 concurrent requests 24/7?

    A) Breakeven API pricing (est. 2B IN + 1B OUT throughput/month):

                            IN price    OUT price
      Self-hosted           $0.121/M    $0.363/M
      OpenRouter (budget)   $0.098/M    $0.196/M
      OpenRouter (DeepSeek) $0.140/M    $0.280/M
    

    B) Breakeven subscription (users active ~1.5h/day):

        1 user: $563/mo (oh, hai)
        25 users: $23/mo
        100 users: $6/mo

    • Vouched your comment. Very cool. What are you running on to get 190 tok/s? I get 400 tok/s at c=4 but c=1 is slower than you.