← Back to context

Comment by sweaterkokuro

7 hours ago

In my experience its in all Language Models' nature to maximize token generation. They have been natively incentivized to generate more where possible. So if you dont put down your parameters tightly it will let loose. I usually put hard requirements of efficient code (less is more) and it gets close to how I would implement it. But like the previous comments say, it all depends on how deeply you integrate yourself into the loop.

>> They have been natively incentivized to generate more where possible

Do you have any evidence of this?

  • The cloud providers charge per output token, so aren't they then incentivized to generate as many tokens as possible? The business model is the incentive.

    • This is only true in some cases though and not others. With a Claude Pro plan, I'm being billed monthly regardless of token usage so maximizing token count just causes frustration when I hit the rather low usage limits. I've also observed quite the opposite problem when using Github's Copilot, which charges per-prompt. In that world, I have to carefully structure prompts to be bounded in scope, or the agent will start taking shortcuts and half-assing work when it decides the prompt has gone on too long. It's not good at stopping and just saying "I need you to prompt me again so I can charge you for the next chunk of work".

      So the summary of the annecdata to me is that the model itself certainly isn't incentivized to do anything in particular here, it's the tooling that's putting its finger on the scale (and different tooling nudges things in different directions).