Comment by maxloh

17 hours ago

Gemini 3 seems to have a much smaller token output limit than 2.5. I used to use Gemini to restructure essays into an LLM-style format to improve readability, but the Gemini 3 release was a huge step back for that particular use case.

Even when the model is explicitly instructed to pause due to insufficient tokens rather than generating an incomplete response, it still truncates the source text too aggressively, losing vital context and meaning in the restructuring process.

I hope the 3.1 release includes a much larger output limit.

9 comments

maxloh

esafak 17 hours ago

People did find Gemini very talkative so it might be a response to that.

NoahZuniga 17 hours ago

Output limit has consistently been 64k tokens (including 2.5 pro).

jayd16 17 hours ago

> Even when the model is explicitly instructed to pause due to insufficient tokens

Is there actually a chance it has the introspection to do anything with this request?

maxloh 16 hours ago
Yeah, it does. It was possible with 2.5 Flash.
Here's a similar result with Qwen Qwen3.5-397B-A17B: https://chat.qwen.ai/s/530becb7-e16b-41ee-8621-af83994599ce?...
- jayd16 16 hours ago
  
  Ok it prints some stuff at the end but does it actually count the output tokens? That part was already built in somehow? Is it just retrying until it has enough space to add the footer?
verdverm 16 hours ago

No, the model doesn't have purview into this afaik
I'm not even sure what "pausing" means in this context and why it would help when there are insufficient tokens. They should just stop when you reach the limit, default or manually specified, but it's typically a cutoff.
You can see what happens by setting output token limit much lower
otabdeveloper4 16 hours ago

No.

MallocVoidstar 17 hours ago

> Even when the model is explicitly instructed to pause due to insufficient tokens rather than generating an incomplete response

AI models can't do this. At least not with just an instruction, maybe if you're writing some kind of custom 'agentic' setup.

maxloh 16 hours ago

Yeah, it does. It was possible with 2.5 Flash.
Here's a similar result with Qwen Qwen3.5-397B-A17B: https://chat.qwen.ai/s/530becb7-e16b-41ee-8621-af83994599ce?...