← Back to context

Comment by Apocryphon

1 hour ago

Perhaps the end state is going to be from the last Hitchhiker's Guide to the Galaxy book, Mostly Harmless:

> Anything that thinks logically can be fooled by something else that thinks at least as logically as it does. The easiest way to fool a completely logical robot is to feed it with the same stimulus sequence over and over again so it gets locked in a loop. This was best demonstrated by the famous Herring Sandwich experiments conducted millennia ago at MISPWOSO (the MaxiMegalon Institute of Slowly and Painfully Working Out the Surprisingly Obvious).

> A robot was programmed to believe that it liked herring sandwiches. This was actually the most difficult part of the whole experiment. Once the robot had been programmed to believe that it liked herring sandwiches, a herring sandwich was placed in front of it. Where upon the robot thought to itself, Ah! A herring sandwich! I like herring sandwiches.

> It would then bend over and scoop up the herring sandwich in its herring sandwich scoop, and then straighten up again. Unfortunately for the robot, it was fashioned in such a way that the action of straightening up caused the herring sandwich to slip straight back off its herring sandwich scoop and fall on to the floor in front of the robot. Whereupon the robot thought to itself, Ah! A herring sandwich...etc., and repeated the same action over and over again. The only thing that prevented the herring sandwich from getting bored with the whole damn business and crawling off in search of other ways of passing the time was that the herring sandwich, being just a bit of dead fish between a couple of slices of bread, was marginally less alert to what was going on than was the robot.

> The scientists at the Institute thus discovered the driving force behind all change, development and innovation in life, which was this: herring sandwiches. They published a paper to this effect, which was widely criticised as being extremely stupid. They checked their figures and realised that what they had actually discovered was “boredom”, or rather, the practical function of boredom. In a fever of excitement they then went on to discover other emotions, Like “irritability”, “depression”, “reluctance”, “ickiness” and so on. The next big breakthrough came when they stopped using herring sandwiches, whereupon a whole welter of new emotions became suddenly available to them for study, such as “relief”, “joy”, “friskiness”, “appetite”, “satisfaction”, and most important of all, the desire for “happiness”. This was the biggest breakthrough of all.

> Vast wodges of complex computer code governing robot behaviour in all possible contingencies could be replaced very simply. All that robots needed was the capacity to be either bored or happy, and a few conditions that needed to be satisfied in order to bring those states about. They would then work the rest out for themselves.

Damn I had forgotten about this section of the book to the point that even reading it, I only recognised the style as typical Adams.

Guess that means I'm overdue for a re-read! Jaay!

I love that book, that said, the point is more subtle than that. Current LLM attention models are limited in their feedback. Adding a form of 'shame' feedback (result is technically correct but morally bad or some such) would help here but I doubt the folks building theses things would choose to do so.