Comment by guerrilla
2 days ago
So, is the output price there why most models are extremely verbose? Is it just a ploy to make extra cash? It's super annoying that I have to constantly tell it to be more and more concise.
2 days ago
So, is the output price there why most models are extremely verbose? Is it just a ploy to make extra cash? It's super annoying that I have to constantly tell it to be more and more concise.
> It's super annoying that I have to constantly tell it to be more and more concise.
While system promting is the easy way of limiting the output in a somewhat predictable manner, have you tried setting `max_tokens` when doing inference? For me that works very well for constraining the output, if you set it to 100 you get very short answers while if you set it to 10,000 you can very long responses.