Comment by arjie

9 hours ago

Not “local” and not interactive coding but sharing since it might be helpful. I have 2x RTX Pro 6000 Blackwell running DeepSeek V4 Flash. I get 160 tok/s raw but it’s a reasoning model. For my use case, I have it auto-write code and another system auto-review the code.

I occasionally use it with pi to write some code and it’s blazing fast but it’s mostly habit that keeps me with CC and Codex.

8 comments

arjie

akersten 7 hours ago

> I have 2x RTX Pro 6000 Blackwell

Where did you find/order these? All the sites I can find are either out of stock, only sell to businesses, or are otherwise sketchy...

zackify 12 minutes ago

Microcenter is the easiest place but almost any vendor will sell to you after you email them and if you have an LLC
arjie 4 hours ago

I run a small business (https://technologybrother.com) that runs a few small SaaS so I ordered the GPUs through corporate sales. If the barrier is getting an LLC, those are relatively cheap. The nice thing is that if you've got a legitimate business with use for GPUs you can get into the Nvidia Inception Program which has a pretty solid discount.

leptons 8 hours ago

Have you measured your electricity consumption for this rig? I have to wonder how much it would cost you per month.

ux266478 7 hours ago
Not nearly as much as you might think. 1.2kw where I live translates to about $0.12/hr, and that's when running full clip. If you have a decent solar hookup, it's small fraction on a sunny day.
The expensive part is the upfront hardware cost and the electrical system upgrade you'll need to give your house.
- leptons 1 hour ago
  
  I'm paying about $0.19/hr and using half that power just for a large spinning RAID, running some VMs and security cameras. And I'm reconsidering my digital extravagance because of the electric bill. You probably make way more money than I do.

mtone 4 hours ago

Here's a DeepSeek-V4-Flash benchmark on 2X RTX Pro 6000:

  - Prefill: ~10K tok/s
  - Decode: 190 | 375 | 980 tok/s (for 1 | 4 | 16 concurrent requests)
  - GPU power draw during benchmark: Average: 585W | Max: 849W | Limit: 1200W with undervolt. Idle PC is 125W.

I've asked it to calculate the following considering a realistic blend of cached prompts and decode for agentic dev scenario.

Electricity-only (@ USD $0.08/kWh)

  Usage          | IN price  | OUT price | Monthly cost
  Concurrency=1  | $0.040/M  | $0.080/M  | $8.65 to $38.88 (5% to 100% active)
  Concurrency=4  | $0.024/M  | $0.044/M  | up to $48.67 (cheaper per token but higher power draw)

Total cost of ownership over 3 years is electricity + USD $20K (pre-hike pricing). In a production scenario, how much would I have to charge my users to break even, aiming for 4 concurrent requests 24/7?

A) Breakeven API pricing (est. 2B IN + 1B OUT throughput/month):

                        IN price    OUT price
  Self-hosted           $0.121/M    $0.363/M
  OpenRouter (budget)   $0.098/M    $0.196/M
  OpenRouter (DeepSeek) $0.140/M    $0.280/M

B) Breakeven subscription (users active ~1.5h/day):

    1 user: $563/mo (oh, hai)
    25 users: $23/mo
    100 users: $6/mo

arjie 8 minutes ago

Vouched your comment. Very cool. What are you running on to get 190 tok/s? I get 400 tok/s at c=4 but c=1 is slower than you.