Comment by lloyd-christmas
2 days ago
I thought the same thing when I started using locals, but the reality is that - for a given context depth - the token generation speed doesn't change whether it's 128 or 8000, it just lengthens the benchmark run time.
No comments yet
Contribute on Hacker News ↗