Comment by recognity
5 hours ago
The insight about TTFT dominating everything resonates. We're seeing the same pattern in CLI tools — the perceived speed of AI features comes down to how fast you get the first useful output, not total processing time.
Curious about your semantic end-of-turn detection: are you using a separate lightweight model for that, or is it baked into the main LLM inference? That seems like the hardest part to get right without adding latency.
No comments yet
Contribute on Hacker News ↗