Comment by intothemild
12 hours ago
Since I started running my own inference server, I've had zero degradation that I didn't do myself. Basically the only time I see it get worse is if I drop one of the quants.
Which is what I suspect the providers are doing to fit more inference on the same amount of hardware over time.
No comments yet
Contribute on Hacker News ↗