Comment by jjcm

1 day ago

I did an estimate of that if you're interested: https://x.com/pwnies/status/2028831699736637912

The TL;DR though is that a 10-15b param model baked into an ASIC with the latest fab tech would take around 62W of power draw when active. At ~10k+ t/s though it likely would only be active for short bursts of time. It'd fit perfectly fine within the thermal envelope of a laptop.

The approach makes a lot of sense. Once you get to those speeds, latency of the network becomes one of the bigger bottlenecks, so local has a real advantage over a subscription.

4 comments

jjcm

wmf 20 hours ago

You're not counting the capex which could be the same cost as 5-10 years of Claude.

giantrobot 20 hours ago
This assume Claude's price doesn't change. Which isn't a great assumption considering inference providers are moving to usage based billing. Also the VC money isn't going to last indefinitely. Current inference providers are being subsidized with VC money at this point.
- nl 6 hours ago
  
  > Current inference providers are being subsidized with VC money at this point.
  This isn't true.
  Anthropic is making an operating profit including the loss making subsidised subscriptions (but excluding training).
  Your normal inference provider is doing great. Do the math on a H100 rental and you can see the margins.

IanCal 21 hours ago

Is latency of the network that noticeable? Aren’t we talking low hundreds of ms at worst here? Much lower for something close regionally.