Comment by SXX
5 hours ago
I think your demo need more realistic thinking logs because thinking usually burns at least 2x to 3x of tokens of the code and for harder tasks much more.
5 hours ago
I think your demo need more realistic thinking logs because thinking usually burns at least 2x to 3x of tokens of the code and for harder tasks much more.
Indeed, at 30tok/s make it pause for 20 seconds while "thinking" is streaming (and hidden); that's the real experience.
You should check out https://tokey.ai, I made it a few months ago and has all of these suggestions.
Yes, it should use actual output from some of the open models.