Comment by SXX

5 hours ago

I think your demo need more realistic thinking logs because thinking usually burns at least 2x to 3x of tokens of the code and for harder tasks much more.

4 comments

SXX

unglaublich 4 hours ago

Indeed, at 30tok/s make it pause for 20 seconds while "thinking" is streaming (and hidden); that's the real experience.

sig_kill 1 hour ago

You should check out https://tokey.ai, I made it a few months ago and has all of these suggestions.

redox99 3 hours ago

Yes, it should use actual output from some of the open models.