Comment by diggan

2 days ago

> It's super annoying that I have to constantly tell it to be more and more concise.

While system promting is the easy way of limiting the output in a somewhat predictable manner, have you tried setting `max_tokens` when doing inference? For me that works very well for constraining the output, if you set it to 100 you get very short answers while if you set it to 10,000 you can very long responses.