Comment by port3000

2 months ago

The 'flash' / no or low-thinking versions of those models are crazy fast. We often receive full response (not just first token) in less than 1 second via API.

0 comments