Comment by ac29
6 days ago
The whole point of thinking is to throw more compute/tokens at a problem, so it will always add latency over non thinking modes/models. Many models do support variable thinking levels or thinking token budgets though, so you can set them to low/minimal thinking if you want only a minimal increase in latency versus no thinking.
No comments yet
Contribute on Hacker News ↗