Comment by gf000

16 hours ago

My understanding is that inference models can absolutely scale down, we are only at the beginning of these getting minimized, and they are trivial to parallelize. That's not a good combo to be against them, their price/performance/efficiency will quickly drop/grow/grow.