Comment by wolttam

3 hours ago

I self-host DeepSeek V4 Flash on 2 DGX Sparks (approx. $10k)

I expect DeepSeek V4 Flash (or an equivalently sized model) to reach parity with GLM 5.2 some time this year (this based on DeepSeek V4 Flash launching at GLM 5.0 parity[0], and GLM 5.2 being freely available to distill from)

GLM 5.2 is within spitting distance of Opus 4.8 and is at least as good as Opus 4.6[1] which some devs were willing to spend hundreds to single-digit thousands of dollars a month for a few months ago.

[0]: https://artificialanalysis.ai/models/comparisons/deepseek-v4...

[1]: https://artificialanalysis.ai/models/comparisons/claude-opus...

3 comments

wolttam

ipsod 3 hours ago

How fast is it?

wolttam 3 hours ago

2000 t/s prompt processing and 40-50 t/s generation. We should see 60-70 t/s generation with DSpark support solidifying in vLLM in a few days
Recent discussion on DSpark: https://news.ycombinator.com/item?id=48696585