Comment by johnea
7 months ago
A part of my comment on another thread:
To me, this represents one of the most serious issues with LLM tools: the opacity of the model itself. The code (if provided) can be audited for issues, but the model, even if examined, is an opaque statistical amalgamation of everything it was trained on.
There is no way (that I've read of) for identifying biases, or intentional manipulations of the model that would cause the tool to yield certain intended results.
There are examples of DeepState generating results that refuse to acknowledge Tienanmen square, etc. These serve as examples of how the generated output can intentionally be biased, without the ability to readily predict this general class of bias by analyzing the model data.
> the opacity of the model itself. The code (if provided) can be audited for issues, but the model, even if examined, is an opaque statistical amalgamation of everything it was trained on
This seems to be someone messing with the prompt, not with the model. It's laughably bad.
I could definitely see that being the case in this so called "white genocide" thing on grok, but I still have to wonder in general.
For instance with the Chinese models refusing to acknowledge Tienanmen square (as an example). I wonder about the ability to determine if such a bias is inherent in the data of the model, and what tools might exist to analyze models to determine how their training data might lead to some intentional influence on what the LLM might output.
I'm not an LLM expert (and never will be), so I'm hoping someone with deeper knowledge can shed some light...
With most Chinese models, you can run them locally.
You can then specifically prompt the model to do a CoT before answering (or refusing to answer) the question about e.g. Tiananmen. In my experiments, both QwQ and DeepSeek will exhibit awareness of the 1989 events in their CoT, but will specifically exclude it from their final answer on the basis that it is controversial and restricted in China.
It gets even funnier if you do multi-turn, and on the next turn, point out to the model that you can see its CoT, and therefore what it thought about Tiananmen. They are still finetuned into doing CoT regardless and just can't stop "thinking about the white elephant" while refusing to acknowledge it in more and more panicked ways.
This is why we shouldn't give up on open source self-hosted LLMs.
Open weights or open source? Because I've yet to see "this is exactly how you can regenerate weights" or at least "this is cryptographic proof of training validity"