Comment by make3
14 hours ago
It would be very easy for them to switch the various (compute) cost vs performance knobs down depending on load to maintain a certain latency; you would see oscillations like this, especially if the benchmark is not always run exactly at the same time every day.
& it would be easy for them to start with a very costly inference setup for a marketing / reputation boost, and slowly turn the knobs down (smaller model, more quantized model, less thinking time, fewer MoE experts, etc)
No comments yet
Contribute on Hacker News ↗