Comment by port3000
10 hours ago
The 'flash' / no or low-thinking versions of those models are crazy fast. We often receive full response (not just first token) in less than 1 second via API.
10 hours ago
The 'flash' / no or low-thinking versions of those models are crazy fast. We often receive full response (not just first token) in less than 1 second via API.
No comments yet
Contribute on Hacker News ↗