Comment by moffkalast

4 hours ago

Especially when 200ms is the rule of thumb for things still feeling "instant" to users in terms of UX, this is like a rounding error in terms of latency when I regularly wait for actual minutes for an LLM to finish its bloody thinking and have to refresh through several "we're experiencing heavy load" errors.