Comment by laffOr

13 hours ago

> What we have found: the model has strong internal constraints against harmful actions (consistently refuses things it flags as problematic), but the harder risk is subtler -- it can get into loops where it takes many small individually-reasonable actions that compound into something the operator did not intend.

Interestingly this has been well anticipated by Asimov's laws of robotics, decades ago. Drawing the quote from Wikipedia:

> Furthermore, he points out that a clever criminal could divide a task among multiple robots so that no individual robot could recognize that its actions would lead to harming a human being

>Asimov, Isaac (1956–1957). The Naked Sun (ebook). p. 233. "... one robot poison an arrow without knowing it was using poison, and having a second robot hand the poisoned arrow to the boy ..."