← Back to context

Comment by jefftk

2 years ago

What's the issue with including some amount of "model is aligned to the interests of humanity as whole"?

If someone asks the model how to create a pandemic I think it would be pretty bad if it expertly walked them through the steps (including how to trick biology-for-hire companies into doing the hard parts for them).

It is very unlikely that the development team will be able to build features that actually cause the model to act in the best interests of humanity on every inference.

What is far more likely is that the development team will build a model that often mistakes legitimate use for nefarious intent while at the same time failing to prevent a tenacious nefarious user from getting the model to do what they want.

  • I think the current level of caution in LLMs is pretty silly: while there are a few things I really don't want LLMs doing (telling people how to make pandemics is a big one) I don't think keeping people from learning how to hotwire a car (where the first google result is https://www.wikihow.com/Hotwire-a-Car) is worth the collateral censorship. One thing that has me a bit nervous about current approaches to "AI safety" is that they've mostly focused on small things like "not offending people" instead of "not making it easy to kill everyone".

    (Possibly, though, this is worth it on balance as a kind of practice? If they can't even keep their models from telling you how to hotwire a car when you ask for a bedtime story like your car-hotwiring grandma used to tell, then they probably also can't keep it from disclosing actual information hazards.)

    • That reminds me of my last query to ChatGPT. A colleague of mine usually writes "Mop Programming" when referencing out "Mob programming" sessions. So as a joke I asked ChatGPT to render an image of a software engineer using a mop trying to clean up some messy code that spills out of a computer screen. It told me that it would not do this because this would display someone in a derogatory manner.

      Another time I tried to let it generate a very specific Sci-fi helmet which covers the nose but not the mouth. When it continusly left the nose visible, I tried to tell it to make this particular section similar to Robocop, which caused it again to deny to render because it was immediately concerned about copyright. While I at least partially understand the concern for the last request, this all adds up to making this software very frustrating to use.

for one, it requires the ability for the people who "own" the model to control how end users use it.

  • I agree that this sort of control is a downside, but I don't see a better option? Biology is unfortunately attacker-dominant, and until we get our defenses to a far better place, giving out free amoral virologist advisors is not going to go well!

IMO as long as it's legal.

  • The laws here are in a pretty sad shape. For example, did you know that companies that synthesize DNA and RNA are not legally required to screen their orders for known hazards, and many don't? This is bad, but it hasn't been a problem yet in part because the knowledge necessary to interact with these companies and figure out what you'd want to synthesize if you were trying to cause massive harm has been limited to a relatively small number of people with better things to do. LLMs lower the bar for causing harm by opening this up to a lot more people.

    Long term limiting LLMs isn't a solution, but while we get the laws and practices around risky biology into better shape I don't see how else we avoid engineered pandemics in the meantime.

    (I'm putting my money where my mouth is: I left my bigtech job to work on detecting engineered pathogens.)