← Back to context

Comment by janalsncm

3 days ago

It is worth pointing out that we got here despite all of the “alignment” research and safetyism surrounding the models. As it turns out, the models don’t wake up and start destroying things. We knew this all along, but every time a new article came along and anthropomorphized and exaggerated another experiment it fed the clickbait machine.

The fundamental alignment issue is aligning the companies themselves with society, not the models with the companies. Widespread unemployment is not aligned with society, but it is aligned with Anthropic and OpenAI if it makes them rich.

Therefore the only “harms” the companies will take seriously are those which also harm the company. For example reputational harms from enabling scams aren’t allowed.

Perhaps all of this isn’t fair, since companies actively subverted safety research for profitability. But then I would go back to my earlier point of over-indexing on unintended behaviors and under-indexing on intended ones.