Comment by torginus

3 months ago

Which has become largely true? People flip-flop between the hottest AI model of the day. After a flagship AI model ships, distillations appear that offer slightly degraded performance at the fraction of the cost.

For inference, the difference between expensive data center hardware and homemade GPUs largely comes down the distinction of RAM. Which is a limitation actively worked around (unfortunately all the well-funded orgs are not that interested in this)