Comment by j-bos
3 days ago
Guardrails for anything versatile might be trivial on consideration.
As a kid I read some Asimov books where he laid out the "3 laws of robotics", first law being a robot must not harm a human. And in the same story a character gave the example of a malicious human instructing Robot A prepare a toxic solution "for science", dismissing Robot A, then having Eobot B unsuspectingly serve the "drink" to a victim. Presto, a robot killing a human. The parallel to malicious use of LLMs has been haunting me for ages.
But here's the kicker, Iirc, Asimov wasn't even really talking about robots. His point was how hard it is to align humans, for even perfectly morally upright humans to avoid being used to harm others.
Also worth considering that the 3 Laws were never supposed to be this watertight infallible thing. It was created so that the author could explore all sorts of exploits and shenanigans in his works. It's meant to be flawed, even though on the surface it appears to be very elegant and good.
I was never a fan of that poisoned drink example. The second robot killed the human in a similar way to the drink itself, or a gun if one were used instead.
The human made the active decisions and took the actions that killed the person.
A much better example is a human giving a robot a task and the robot deciding of its own accord to kill another person in order to help reach its goal. The first human never instructed the robot to kill, it took that action on its own.
This is actually touched on in the webcomic _Freefall_, which ultimately hinges on a trial of an attempt to lobotomize all robots on a planet.
It's a bit of a rough start, but well-worth reading, and easily read if one uses the speed reader:
https://tangent128.name/depot/toys/freefall/freefall-flytabl...
But the thing is, LLMs have limited context windows. It's easier to get an LLM to not put the pieces together than it is a human.
https://xkcd.com/1613/