Comment by nh2

21 hours ago

> The agent cannot learn from its mistakes. The agent will never produce any output which will help you invoke future agents more safely

That is not entirely true:

Given that more and more LLM providers are sneaking in "we'll train on your prompts now" opt-outs, you deleting your database (and the agent producing repenting output) can reduce the chance that it'll delete my database in the future.

4 comments

nh2

MagicMoonlight 21 hours ago

Actually no, it will increase it. Because it’ll be trained with the deletion command as a valid output.

simonh 19 hours ago
Exactly. It’s just giving the LLM a token pattern, and it’s designed to reproduce token patterns. That’s all it does. At some point generating a token pattern like that again is literally it’s job.
- nh2 4 hours ago
  
  Why would one set up reinforcement learning like that?
  The point of creating samples from user data should surely be to label them good or bad, based on the whole conversation.
  You look at what happened eventually, judge the outcome as bad, and thus train the "rm" token in the middle to be less likely.
  
  1 reply →