Comment by ethbr1
1 day ago
The irony of implicit connections in training data is funny.
I.e. even if you create an explicit Tiananmen Square massacre-shaped hole in your training data... your other training data implicitly includes knowledge of the Tiananmen Square massacre, so might leak it in subtle ways.
E.g. how there are many posts that reference June 4, 1989 in Beijing with negative and/or horrified tones?
Which at scale, an LLM might then rematerialize into existence.
More likely SOTA censorship focuses on levels above base models in the input/output flow (even if that means running cut-down censoring models on top of base models for every query).
Would be fascinated to know what's currently being used for Chinese audiences, given the consequences of a non-compliant model are more severe.
No comments yet
Contribute on Hacker News ↗