← Back to context

Comment by simonw

1 year ago

I included that note because output limits are a personal interest of mine.

Until recently most models capped out at around 4,000 tokens of output, even as they grew to handle 100,000 or even a million input tokens.

For most use-cases this is completely fine - but there are some edge-cases that I care about. One is translation - if you feed in a 100,000 token document in English and ask for it to be translated to German you want about 100,000 tokens of output, rather than a summary.

The second is structured data extraction: I like being able to feed in large quantities of unstructured text (or images) and get back structured JSON/CSV. This can be limited by low output token counts.

Sure, your cases are perfectly reasonable. I just wish the LLMs had a "feel" about when to output long or short text. Always thinking about adding something like "be as concise as possible" is kinda tedious