Comment by doe_eyes

1 year ago

Alignment started as a fairly nifty idea, but you can't meaningfully test for it. We don't have the tools to understand the internals of an LLM.

So yes, it morphed into the second best thing, brand safety - "don't say racist / anti-vax stuff so that we don't get bad press or get in trouble with the regulators".