Comment by scottcha
20 hours ago
Two reasons that I understand 1. not all these AIs are LLMs and many have much lower latency SLAs than chat and 2. These are just one part of a service architecture and when you have multiple latencies across the stack they tend to have multiplicative effects.
No comments yet
Contribute on Hacker News ↗