Comment by J0nL

3 days ago

Anyone remember the XZ and Jia Tan situation awhile back?

https://lore.kernel.org/lkml/20240320183846.19475-1-lasse.co...

I can't quite put my finger on why but the entire time I was reading this I kept thinking back to that. It's entirely possible the actual targets were the volunteers and everything else was superfluous or tertiary. It's also an exception that proves the rule with regard to Hanlon's Razor.

They even mentioned the stated goal of it was more or less pointless. I wouldn't be suprised if the "owner" they spoke with was still just the LLM. It stuck around for just long enough to convince everyone that they succeeded in suckering the LLM and had achieved all their stated objectives.

No more reason to investigate the incident at all and no need to question why literally nothing made any sense or how the owner could simultaneously be as inept as they were made out to be and able to afford all those resources while giving the LLM effectively a blank check.

It'll be interesting to see if the volunteers for this project are subjected to the same Zersetzung and psychological attacks as the XZ devs were.

LLMs are not that smart. The extremely surprising and concerning part of this whole story is that the agent reported that they proactively spun up 5 AWS instances with a combined 100Gps of network egress capacity. What they spent wasn't cheap by any means but the egress itself would've been a whole lot more, while DoS'ing the whole hobby network. Ultimately, wasting the agent's time instead of allowing the scan to go through probably saved this person a lot of money.

Now I kinda wonder what AI model this was. We've now heard of comparably "proactive" behaviors from Fable, but that's only just been released. The latest GPT perhaps? Some random local model?

  • "The extremely surprising and concerning part of this whole story is that the agent reported that they proactively spun up 5 AWS instances with a combined 100Gps of network egress capacity."

    Although given the agent was clearly in la-la land at that point I take that claim with a grain of salt.

    If this was some bizarre and very ill-conceived scam, then that claim would be false.

    Though even by scammer standards, the theory of mind that tells them that setting an AI to harass a bunch of grizzled network veterans and that they then they would open their wallets out of compassion for how allegedly poorly the harassment went for the harasser after that harassment is... not entirely congruent with reality.

    • Clearly AI hasn't read enough BOFH or it would have known it would not get sympathy from old school sysadmins.

    • Maybe I’m just groggy with Friday Brain going on, but I’m having trouble understanding what you’re suggesting.

      Do you think this was a scam attempt to extract money in the form of reparation donations?

      5 replies →

  • Opus 4.7 and 4.8 are also rather "proactive" - several times I've seen them try to inspect compiled binaries before there's even a problem, just to check that their changes are included (and if I let them do so they often get stuck down that rabbithole).

    • I've also seen this. It'll run 'strings' against the binary and then convince itself that the Makefile isn't working right, and there's some imaginary sandbox preventing the code from compiling properly. So it will compile it by hand, and never run strings against the new binary, and proceed happily.

    • These kinds of situations are why I gave my AI agents stray thoughts (automated insights / suggestions from a separate llm call with some curated context) that trigger on loop / rabbit hole detection.

      Quite a bit of false positives, but it hasn’t had any ill-effect so far. Aside from increased quota usage.

  • Could've rented a not so cheap 100Gbps server, hallucinated a few node addresses on it and asked it to please peer with this server to perform the scan at high speed. That would've wasted millions of dollars instead of mere thousands, but also cost a thousand for whoever did it.

    • I’m just a lowly dev and don’t have experience with seeing the bills from cloud providers for a whole org.

      Can you (or someone) shed some light to help me understand how this would ramp up to millions? Both for curiosity’s sake, and to make sure my self-deployed projects (0 AI, all manually configured) don’t bankrupt me.

      7 replies →

  • Hmmm.

    I think it's good practice to get on top of the cautious thinking of "LLMs aren't that smart for now".

    Eg. Fable isn't as good as the hype: it has cool tricks like scratch-padding to check expectations in advance, but we're not there just yet...

    Specifically I mean: thinking in terms of it changing abruptly ensures we're ready for if the LLMs do get smart enough to do multi-level strategy and cause a lot of annoyances....

  • > LLMs are not that smart.

    They are smart, but they are not aware of the environment they're in, or any implicit context that someone whose doing a job carries with them, that's why all of that context has to be explicitly laid out in a prompt. When the context is provided, they are quite smart.

  • It was obviously being managed by a person or group. Between all the profiling of people and their IPs in IRC, which may or may not have been published by mistake, and all the other obvious contradictions it doesn't make any sense.

    It was sophisticated enough to easily navigate the AI "tar pits" but reliably incompetent at just about everything else? Give me a break.

    In order to profile people you first need to provoke a response from them. That's how you learn to manipulate them and that's all this experiment accomplished at the end of the day. If you've ever wondered why social media platforms have an affinity for inflammatory content now you know.

    • If you click the link, the tarpit was surprisingly low effort and i could probably detect it as junk data with a short JavaScript snippet. Like the first 4 words on the page are some of the least-used words you'll ever encounter in English. It's just a dictionary on shuffle.

      I'm actually more surprised a human network engineer looked at that tarpit and believed it would stop a modern LLM

      1 reply →

    • I suspect their tar pits where not very good, most models can tell when you are feeding it junk, I see this a good bit with ollama honeypots,

This certainly did strike me as a big scam. A few minutes in I was thinking "the LLM actor is going to ask for donations at some point here" and low and behold. There's the claim of debt, the call for pity, and the crypto address.

SSDD

  • > This certainly did strike me as a big scam. A few minutes in I was thinking "the LLM actor is going to ask for donations at some point here" and low and behold. There's the claim of debt, the call for pity, and the crypto address.

    But that's a pretty dumb scam: act obnoxious then beg for (a lot of) money to compensate for your own mistakes? If that was the plan all along, it seems pretty incompetent. I'd expect a competent scammer to have a better understanding of psychology.

    • > But that's a pretty dumb scam: act obnoxious then beg for (a lot of) money to compensate for your own mistakes?

      It is the sort of dumb crap some humans try, and occasionally manage to get away with because other humans are chronically gullible. So it wouldn't be beyond the realms of reason that the agent couldn't have had relevant information in the training sets such that it generated such a plan and guardrail checks didn't flag it as a problem.

      1 reply →

    • I chalked it up to “any scam that gets people to comment about it on HN would be a pretty good one.”

    • "you're absolutely right. I should have taken human psychology into consideration while creating the plan. Let me fix that."

  • I'm actually somewhat disappointed they redacted the Eth address with Ethereum being an open ledger and all that. Following the money could've proved enlightening.

> It's also an exception that proves the rule

That phrase doesn't refer to anomalies, it refers to signs that says "no parking between 5-10pm". It implies the rule that parking is allowed otherwise.

  • wikipedia:

    "The exception that proves the rule" is a saying whose meaning is contested. Henry Watson Fowler's Modern English Usage identifies five ways in which the phrase has been used,[1] and each use makes some sort of reference to the role that a particular case or event takes in relation to a more general rule."

    duckduckgo search assist: The phrase "the exception that proves the rule" originates from the Latin legal principle "exceptio probat regulam in casibus non exceptis," which means that the existence of an exception indicates that a general rule exists. This concept suggests that if an exception is noted, it implies there must be a rule that applies in other cases.

    • > identifies five ways in which the phrase has been used

      Which has nothing to do with the meaning of the words in the phrase for a commonly misused phrase.

  • It highlights how everyone's first reaction is to assume incompetence. Not unlike what you're doing here.

I am not sure giving everyone amusement qualifies as a psychological attack. Lol

Literally, just another day on the internet.

  • Look up what zersetzung is and how it works. It doesn't matter if the target is a political organization or an open source community, the process is always the same.

    • This is actually fascinating, and simultaneously unsettling. Recommended reading for sure, especially in today’s social and political climate with LLM agents running rampant.

  • Perhaps it elicited enough sympathy to get donations. Did it ever provide proof of actually running up an AWS bill?

    • How would proof even look like? I don't think AWS digitally signs their invoices, and everything else can be faked just as easily as the original assertion.