Comment by NitpickLawyer
21 hours ago
Not exactly N times, no. In a traditional transformer arch token 1 is cheaper to generate than token 1000 is cheaper than token 10k and so on. So having 10x 1000 tokens would be cheaper to run concurrently than 10.000 in one session.
You also run into context issues and quality degradation the longer you go.
(this is assuming gemini uses a traditional arch, and not something special regarding attention)
No comments yet
Contribute on Hacker News ↗