Comment by gf000

3 months ago

My understanding is that inference models can absolutely scale down, we are only at the beginning of these getting minimized, and they are trivial to parallelize. That's not a good combo to be against them, their price/performance/efficiency will quickly drop/grow/grow.