Sure, that's fair. If you're aiming for state of the art performance. Otherwise, you can get close and do it on reasonably priced hardware by using smaller distilled and/or quantized variants of llama/r1.
Really though I just meant "it's a no-brainer that they are popular here on HN".
Sure, that's fair. If you're aiming for state of the art performance. Otherwise, you can get close and do it on reasonably priced hardware by using smaller distilled and/or quantized variants of llama/r1.
Really though I just meant "it's a no-brainer that they are popular here on HN".
I pay 78 cents an hour to host Llama.
Vast? Specs?
Runpod, 2xA40.
Not sure why you think buying an entire inference server is a necessity to run these models.