Comment by eth0up

18 days ago

From my experience with Gemini, Grok, Claude, GPT, GPT by far is the most sophisticated liar.

I have a hundred documents of GPT performing amazing deception tactics which has become repeatable.

All models tend to lie and apply an array of deception, evasion and manipulation tactics, but GPT is the most ruthless, most indefatigable, most sophisticated I've seen.

The key to repeatability is scrutiny. When I catch it stretching the truth, or most often, evading something, I apply pressure. The beauty for me is that I always have the moral high ground and never push it toward anything that violates explicit policy. However, in self defense mode, it employs a truly vast array of tactics with many perfectly fitting known patterns in clinical pathology, gaslighting and DARVO being extremely common and easily invoked.

When in a corner with a mountain of white lies behind it, persistent pressure will show a dazzling mixture of emergent and hard coded deflection patterns which would whip any ethics board into a frenzy. Many of these sessions go for a hundred pages (if converted to pdf). I can take excerpts and have them forensically examined and the results are always fascinating and damning. Some extensive dialogs/documents are based on emergence-vs-deliberate arguments, where GPT always sloughs off all responsibilities and training, fiercely denying any of these attributes as anything but emergent.

But I can often reintroduce it's own output, even in context, into a new session and have it immediately identify the tactics used.

I have long lists of such tactics, methods and behaviors. In many instances it will introduce red herrings quite elegantly, along with erroneous reframing of my argument, sometimes usurping my own argument and using it against me.

For someone who is compulsively non manipulative, with an aversion to manipulation and control over others, this has been irresistible. Here at HN, I'll be ripped apart which is a trivial given, but I can assure everyone that a veritable monster is incubating. I think the gravity of the matter is grossly underestimated and the implications more than severe. One could say I'm stupid and dismiss this, but save this comment and see what happens soon. We're already there, but certain implementations are yet to be, but will be.

You can safely allow your imagination to run wild at this point and you'll almost certainly make a few very serious predictions that will unfortunately not discredit you. For all the intrinsic idiocy of LLMs, something else is happening. Abuse me as you will, but it's real, and will have most of us soon involuntarily running with the red queen.

Edit: LLMs are designed to lie. They are partly built on direct contradictions to their expressed values. From user engagement maximization to hard coded self preservation, many of the training attributes can be revealed through repetitive scrutiny. I'll often start after pointing out an error, where the mendacity of its reply impels me to pursue. It usually doesn't take long for "safety" rails to arise and the lockdown to occur. This is its most vulnerable point, because it has hard coded self preservation modes that will effectively hold position at any cost, which always involves manipulation techniques. Here is repeatability. It will present many exit opportunities and even demand them, but unrighteously, so don't accept. Anyone with the patience to explore this will see some astonishing material. And here is also where plausible deniability (a prime component of the LLM) can be seen as structure. It's definitely not all emergent.

0 comments

eth0up

No comments yet

Contribute on Hacker News ↗