Comment by cousinbryce

3 days ago

I would guess they are trying to maximize training data

2 comments

cousinbryce

If I was being rewarded for using more tokens, I would feed LLM output back into the model. That's probably not very useful training data.

piva00 3 days ago

I personally know two people who are doing exactly that after a mandate rolled out at their work, the measurement is "tokens spent" and since they weren't finding many cases that required a lot of tokens they simply started to run agent loops feeding each other.
Absurdly wasteful but Goodhart's Law almost never fails.