← Back to context

Comment by NitpickLawyer

21 hours ago

Not exactly N times, no. In a traditional transformer arch token 1 is cheaper to generate than token 1000 is cheaper than token 10k and so on. So having 10x 1000 tokens would be cheaper to run concurrently than 10.000 in one session.

You also run into context issues and quality degradation the longer you go.

(this is assuming gemini uses a traditional arch, and not something special regarding attention)