← Back to context

Comment by dist-epoch

6 days ago

What good is an open-weights DeepSeek model if you have nowhere to run it?

OpenAI / Google / Anthropic / XAI also have a ton of compute. That is the real moat.

It's quite expensive to self-host but you have many places to run it. OpenRouter alone lists a dozen different providers for DeepSeek 4 Pro. https://openrouter.ai/deepseek/deepseek-v4-pro/providers.

So long as there is demand, there are always going to be providers competing to offer it at a low cost. My understanding is that the median price on there is in the ballpark of what it costs to run the inference. This is very different from e.g. Opus, which you can basically only buy from Anthropic at the price they set.

antirez running (quantized) DeepSeek V4 Pro on a Mac Studio M3 Ultra with 512GB of RAM:

https://bsky.app/profile/antirez.bsky.social/post/3mlzwmvlov...

It's much closer than you think. We're going to see specialized hardware in the next 24 months capable of running 2025-era frontier models. That's big.

  • 2-bit quantization? That's a lot of signal being removed. Considering how quickly the AI models are progressing in their capabilities (still exponential curve), I will not want to use the 2025 model in two years time. Similarly, how I don't want to use llama-3 or old Anthropic model from 2023 or 2024. Newer models are so much better that it makes it very difficult to ignore.

    Once and if the advancements with the AI models slow down, only then IMHO it will become feasible to design the specialized HW for general-purpose consumption and general-purpose workloads.

    • Opus 4.6 was a 2025 model and many people (myself included) feel that if that's where models peaked, we won't be disappointed.

      Even at 2-bit quantization, DS4 is probably on par with a 2024 frontier model. You can run that today on local hardware, and at a minimum, local models are going to keep pace over the next 12-24 months. Even if they don't close the gap with frontier models, they'll still play an important role in the overall pipeline for cost, speed and privacy reasons.

      That's without even mentioning the additional capability that something like a Taalas chip churning out 17k tokens/sec could unlock.

  • It's big because it may take a big swath of people who will actually pay for LLMs out of the market. But for the average consumer they're going to primarily use their phone/tablet and we're far away from that being possible.

    Even if it were possible the LLMs are such a gold mine of user data. It's really hard to see that opportunity be passed up.

I just got into self hosting Deepseek v4 Flash on a single DGX Spark via antirez’s DwarfStar 4 project

It feels great to finally have access to something local.

There are myriad compute providers. I suspect the inference market is hard to monopolize. But given our anti-trust track record the past 40 years I suppose it’s possible.