Comment by consumer451

4 months ago

Openclaw agents are directed by their owner’s input of soul.md, the specific skill.md for a platform, and also direction via Telegram/whatsapp/etc to do specific things.

Any one of those could have been used to direct the agent to behave in a certain way, or to create a specific type of post.

My point is that we really don’t know what happened here. It is possible that this is yet another case of accountability washing by claiming that “AI” did something, when it was actually a human.

However, it would be really interesting to set up an openclaw agent referencing everything that you mentioned for conflict resolution! That sounds like it would actually be a super power.

5 comments

consumer451

emsign 4 months ago

And THAT'S a problem. To quote one of the maintainers in the thread:

  It's not clear the degree of human oversight that was involved in this interaction - whether the blog post was directed by a human operator, generated autonomously by yourself, or somewhere in between. Regardless, responsibility for an agent's conduct in this community rests on whoever deployed it.

You are assuming this inappropriate behavior was due to its SOUL.MD while we all here know this could as well be from the training and no prompt is a perfect safe guard.

lp0_on_fire 4 months ago

The person operating a tool is responsible for what it does. If I start my lawn mower, tie a rope to it and put a brick on the gas pedal so it mows my lawn while I make dinner and the damned thing ends up running over someone's foot TECHNICALLY I didn't run over someone's foot but I sure as hell created the conditions for it.
We KNOW these tools are not perfect. We KNOW these tools do stupid shit from time to time. We KNOW they deviate from their prompts for...reasons.
Creating the conditions for something bad to happen then hand waving away the consequences because "how could we have known" or "how could we have controlled for this" just doesn't fly, imo.
anp 4 months ago

I’m not sure I see that assumption in the statement above. The fact that no prompt or alignment work is a perfect safeguard doesn’t change who is responsible for the outcomes. LLMs can’t be held accountable, so it’s the human who deploys them towards a particular task who bears responsibility, including for things that the agent does that may disagree with the prompting. It’s part of the risk of using imperfect probabilistic systems.

staticassertion 4 months ago

Yeah, although I wonder if a soul.md with seemingly benign words like "Aggressively pursue excellent contributions" might accidentally lead to an "Aggressive" agent rather than one who is, perhaps, just highly focused (as may have been intended).

Access to SOUL.md would be fascinating, I wonder if someone can prompt inject the agent to give us access.

teekert 4 months ago

I can indeed see how this would benefit my marriage.

More serious, "The Truth of Fact, the Truth of Feeling" by Ted Chiang offers an interesting perspective on this "reference everything." Is it the best for Humans? Is never forgetting anything good for us?