Comment by andrewmutz

2 years ago

It is very unlikely that the development team will be able to build features that actually cause the model to act in the best interests of humanity on every inference.

What is far more likely is that the development team will build a model that often mistakes legitimate use for nefarious intent while at the same time failing to prevent a tenacious nefarious user from getting the model to do what they want.

I think the current level of caution in LLMs is pretty silly: while there are a few things I really don't want LLMs doing (telling people how to make pandemics is a big one) I don't think keeping people from learning how to hotwire a car (where the first google result is https://www.wikihow.com/Hotwire-a-Car) is worth the collateral censorship. One thing that has me a bit nervous about current approaches to "AI safety" is that they've mostly focused on small things like "not offending people" instead of "not making it easy to kill everyone".

(Possibly, though, this is worth it on balance as a kind of practice? If they can't even keep their models from telling you how to hotwire a car when you ask for a bedtime story like your car-hotwiring grandma used to tell, then they probably also can't keep it from disclosing actual information hazards.)

  • That reminds me of my last query to ChatGPT. A colleague of mine usually writes "Mop Programming" when referencing out "Mob programming" sessions. So as a joke I asked ChatGPT to render an image of a software engineer using a mop trying to clean up some messy code that spills out of a computer screen. It told me that it would not do this because this would display someone in a derogatory manner.

    Another time I tried to let it generate a very specific Sci-fi helmet which covers the nose but not the mouth. When it continusly left the nose visible, I tried to tell it to make this particular section similar to Robocop, which caused it again to deny to render because it was immediately concerned about copyright. While I at least partially understand the concern for the last request, this all adds up to making this software very frustrating to use.