Comment by skissane

3 months ago

I think Western models are also aligned to ideologically massage facts to suit certain narratives-so I’m not sure Western models really have that big an advantage here.

I also think you overstate how resistant Beijing is to criticism. If you are criticising the foundations of state policy, you may get in a lot of trouble (although I think you may also find the authorities will sometimes just ignore you-if nobody cares what you think anyway, persecuting you can paradoxically empower you in a way that just ignoring you completely doesn’t). But if you frame your criticism in the right way (constructive, trying to help the Party be more successful in achieving its goals)-I think its tolerance of criticism is much higher than you think. Especially because while it is straightforward to RLHF AIs to align with the party’s macronarratives, alignment with micronarratives is technically much harder because they change much more rapidly and it can be difficult to discern what they actually are - but it is the latter form of alignment which is most poisonous to capability.

Plus, you could argue the “ideologically sensitive” topics of Chinese models (Taiwan, Tibet, Tiananmen, etc) are highly historically and geographically particular, while comparably ideologically sensitive topics for Western models (gender, sexuality, ethnoracial diversity) are much more foundational and universal-which might mean that the “alignment tax” paid by Western models may ultimately turn out to be higher.

I’m not saying this because I have any great sympathy for the CCP - I don’t - but I think we need to be realistic about the topic.

1 comment

skissane

kace91 3 months ago

I'm not defending the original idea, to be clear, just pointing out the different argument.

I personally don't find the assumption that a smarter AI would be harder to tame convincing. My experience seems to be that we can tell it's improved precisely because it is better at following abstract instructions, and there is nothing fundamentally different in the instructions "format this in a corporate friendly way" and "format this speech to be alligned with the interest of {X}".

Without that base, the post-talk of who would this smarter untamed AI align with becomes moot.

Besides, we're also missing that if someone's goals is to policy speech, a tool that can scrub user conversations and deduce intention or political leaning has obvious usages. You might be better off as an authoritarian just letting everyone talk to the LLM and waiting for intelligence to collect itself.