← Back to context

Comment by vikramkr

2 months ago

That's not remotely correct- how would you be able to ignore tokens? That's literally what defines the context size of an llm, the larger the more memory. Generating each token is literally the compute you're doing. Your hardware limits how many tokens per second you can produce with a particular model. It's literally what you're consuming electricity to produce. That's like saying you can drive your own car for free without worrying about how many miles you've driven.