← Back to context

Comment by cousinbryce

3 days ago

I would guess they are trying to maximize training data

If I was being rewarded for using more tokens, I would feed LLM output back into the model. That's probably not very useful training data.

  • I personally know two people who are doing exactly that after a mandate rolled out at their work, the measurement is "tokens spent" and since they weren't finding many cases that required a lot of tokens they simply started to run agent loops feeding each other.

    Absurdly wasteful but Goodhart's Law almost never fails.