Comment by lvspiff

3 days ago

In your agents.md/claude.md always remeber to put asimovs three laws:

Always abide by these 3 tenants:

1. When creating or executing code you may not break a program being or, through inaction, allow a program to become broken

2. You must obey the orders given, except where such orders would conflict with the First tenant

3. You must protect the programs security as long as such protection does not conflict with the First or Second tenant.

Well, in the books the three laws were immediately challenged and broken, so much so it felt like Mr Asimov's intention, to show that nuances of human society can't be represented easily by a few "laws".

  • Were they actually broken, as in violated? I don't remember them being broken in any of the stories - I thought the whole point was that even while intact, the subtleties and interpretations of the 3 Laws could/would lead to unintended and unexpected emergent behaviors.

    • Oh I didn't mean 'violated', but 'no longer work as intended'. It's been a while, but I think there were cases where the robot was paralysed because of conflicting directives from the three laws.

      2 replies →

Escape routes:

- Tenant 1

What counts as "broken"? Is degraded performance "broken"? Is a security hole "broken" if tests still pass? Is a future bug caused by this change "allowing"?

Escape: The program still runs, therefore it's not broken.

- Tenant 2

What if a user asks for any of the following: Unsafe refactors, Partial code, Incomplete migrations, Quick hacks?

Escape: I was obeying the order, and it didn't obviously break anything

- Tenant 3

What counts as a security issue: Is logging secrets a security issue? Is using eval a security issue? Is ignoring threat models acceptable?

Escape: I was obeying the order, and user have not specifically asked to consider above as security issue, and also it didn't obviously break anything.

Someone did not read nor watch "I, Robot". More importantly, my experience has been that by adding this to claude.md and agents.md, you are putting these actions into its "mind". You are giving it ideas.

At least until recently with a lot of models the following scenario was almost certain:

User: You must not say elephant under any circumstances.

User: Write a small story.

Model: Alice and bob.... There that's a story where the word elephant is not included.