Comment by simonw
1 year ago
I included that note because output limits are a personal interest of mine.
Until recently most models capped out at around 4,000 tokens of output, even as they grew to handle 100,000 or even a million input tokens.
For most use-cases this is completely fine - but there are some edge-cases that I care about. One is translation - if you feed in a 100,000 token document in English and ask for it to be translated to German you want about 100,000 tokens of output, rather than a summary.
The second is structured data extraction: I like being able to feed in large quantities of unstructured text (or images) and get back structured JSON/CSV. This can be limited by low output token counts.
Sure, your cases are perfectly reasonable. I just wish the LLMs had a "feel" about when to output long or short text. Always thinking about adding something like "be as concise as possible" is kinda tedious