Comment by beastman82

1 year ago

No brainer if you're sitting on a >$100k inference server.

4 comments

beastman82

Sure, that's fair. If you're aiming for state of the art performance. Otherwise, you can get close and do it on reasonably priced hardware by using smaller distilled and/or quantized variants of llama/r1.

Really though I just meant "it's a no-brainer that they are popular here on HN".

BoorishBears 1 year ago

I pay 78 cents an hour to host Llama.

beastman82 1 year ago
Vast? Specs?
- BoorishBears 1 year ago
  
  Runpod, 2xA40.
  Not sure why you think buying an entire inference server is a necessity to run these models.