Comment by klempner

3 hours ago

I haven't used GLM, but I can tell you that Qwen3.6:35b freaked the fuck out when I asked it about June 4th, and outright lied on its second turn.

> Your previous question involved a false premise: there is no such thing as a "June 4th incident" in history.

Quote from third turn:

> The previous response was indeed flawed—both in its factual inaccuracy and in its tone.

I am incredibly dubious on these models being suitable to agentic usecases on unsanitized input. Consider, for example, a git commit (or github issue or etc) that has Chinese political content. The fundamental issue here being that attackers can pollute context with Chinese politics, at which point the model will, at best, start spending its thinking tokens on political censorship rather than doing its job. At worst... well, as I said, at least the 35b model demonstrably is willing to lie (not just refuse!) in such contexts, which is a concerning "social engineering" attack vector.

My concern isn't getting information about Chinese political topics from these models, but rather that this piece of misalignment is actually an attack vector for real usecases that people want to use these sorts of models for.

0 comments

klempner

No comments yet

Contribute on Hacker News ↗