Comment by alansaber

6 days ago

Maybe a reductive question but are there any thinking models that don't (relatively) add much latency?

The whole point of thinking is to throw more compute/tokens at a problem, so it will always add latency over non thinking modes/models. Many models do support variable thinking levels or thinking token budgets though, so you can set them to low/minimal thinking if you want only a minimal increase in latency versus no thinking.