Comment by Gigachad

18 hours ago

Limiting token quotas would be fine. Encourage developers to use efficient models, plan the work first, and to not burn thousands of GPU hours on waste.

It's much like when developers would waste tons of money on AWS spinning up massive test VMs and leaving them running without care. Until the finance people cracked down on it.