Comment by beastman82

5 months ago

No brainer if you're sitting on a >$100k inference server.

4 comments

beastman82

Sure, that's fair. If you're aiming for state of the art performance. Otherwise, you can get close and do it on reasonably priced hardware by using smaller distilled and/or quantized variants of llama/r1.

Really though I just meant "it's a no-brainer that they are popular here on HN".

BoorishBears 5 months ago

I pay 78 cents an hour to host Llama.

beastman82 5 months ago
Vast? Specs?
- BoorishBears 5 months ago
  
  Runpod, 2xA40.
  Not sure why you think buying an entire inference server is a necessity to run these models.