Comment by marcus_holmes

4 days ago

Bad title. This isn't an agent "running amok", this is an early experiment in carrying out an Xz attack by using an agent to build trust (and hacking/impersonating a known-good contributor identity). The agent is obeying commands it was given, the exact opposite of running amok, and although the execution isn't particularly effective, it is having some success (patches have been accepted).

This is deeply scary, not because "agents are running amok" but because a huge amount of our infrastructure is vulnerable to this kind of attack, and if bad people are utilising LLM agents to carry them out, we're in for a wild ride over the next few years.

72 comments

marcus_holmes

lukan 4 days ago

"this is an early experiment in carrying out an Xz attack by using an agent to build trust"

Is this confirmed? There is the message from somebody claiming to be the original contributer claiming to have been hacked, but that was weird (1 h old github account) so other scenarios seem possible

a) really a agent going off the rails

b) the contributer trying to cover up that he let an agent run wild and now made more misstakes along the way

So yes, it seems like an attack to me, but it is far from clear what really happened.

marcus_holmes 4 days ago
From the article:
> "So not saying this was it, but an AI agent automated attempt at a Xz like compromise might really look very similar what we have just seen here."
Without identifying and interviewing the attacker we can't confirm that's what they intended, and there's a possibility that it was just incompetence/ignorance/whatever, but we should probably treat it as an attempted attack even if it wasn't.
- srdjanr 4 days ago
  
  We should treat it as attempted attack in the sense of preparing for the next one, but I don't see why we should call it "attack" without any evidence
  
  2 replies →
alexjurkiewicz 4 days ago
If the real credentials owner was running the agent, why do it from a new GitHub account?
Someone's bug tracker account was hacked.
- m4rtink 4 days ago
  
  So far it looks like just their previously legit Fedora account got taken over & the other accounts (GitHub) then generated on demand as needed for whatever it was trying to achieve, right ?
  BTW, any idea what are the current requirements for creating a new GitHub account ? That could provide some information about if there was actually a person controlling thing thing at that moment to say provide wahtever was necessary to get the new GitHub account.
nottorp 4 days ago

[dead]

coldtea 4 days ago

>Bad title. This isn't an agent "running amok", this is an early experiment in carrying out an Xz attack by using an agent

So still an agent running amok in the project?

Whether it was instructed to run amok, or did it on its own volition, is irrelevant. Except if you're arguing that each individual submission and interaction was individually requested and approved by some operator.

marcus_holmes 4 days ago
"Amok" means "out of control" or "uncontrolled" [0][1]
The agent was under control, as far as we can tell, and obeying its instructions.
This is important for two reasons:
1. There are all the tropes of AI becoming uncontrolled and destroying humanity. Writing bad headlines around AI "running amok" feeds this. We should not be talking about this because it's not actually a problem.
2. It ignores, or overwrites, the much more serious and dangerous problem of LLM agents enabling and automating Xz attacks on OSS projects. We should be talking about this because it is a big problem.
[0] https://dictionary.cambridge.org/dictionary/english/amok [1] https://www.merriam-webster.com/dictionary/amok
- aureate 4 days ago
  
  Even if it was a supply chain attack, which isn't known, the agent was in the "build trust" phase. It was supposed to be doing helpful things, even if the end goal was nefarious, but instead it was "reassigning bugs, fabricating unhelpful replies to bugs, and even persuading maintainers to merge questionable code into the Anaconda installer". Running amok seems an apt description even from the viewpoint of the putative attacker!
  
  4 replies →
- FeepingCreature 4 days ago
  
  > 1. There are all the tropes of AI becoming uncontrolled and destroying humanity. Writing bad headlines around AI "running amok" feeds this. We should not be talking about this because it's not actually a problem.
  if humanity gets destroyed by AI obeying its instructions I'm sure everyone will be very relieved that we didn't pay any attention to fake made up problems like AI not obeying instructions, which of course never happens.
  
  3 replies →
- mfru 4 days ago
  
  The web of trust finally becomes necessary and thus useful.
  GNU was onto something apparently
- haspok 4 days ago
  
  Certainly it might have been out of control of its original owner, perhaps due to a prompt injection attack. If I start a completely benign agent, but someone injects malicious instructions to it, would you still not say "the agent runs amok"?...
- QuadmasterXLII 4 days ago
  
  If I am perfectly moral except that when Kevin from <vpn blocked location> pays me 2 bucks to run naked through San Francisco smashing car windows, I happily do it, am I amok?
resonious 4 days ago
I think the point is that the title makes it sound like people lost control of the agent when really they're in full control.
- Applejinx 4 days ago
  
  No, and it's an important detail. We stand to learn from some developments in politics in recent years because they map pretty much exactly to this threat vector.
  As AI develops, it's able to pursue intentions given to it without having to be spoonfed every little decision by a human operator. This matters, and it means the operator has to extend the leash and allow for a little more chaos… or, if the operator's gone all in on the strategy, a LOT of chaos, and trusting that the agent's seemingly amok actions will serve the grand purpose.
  This is kind of daring, but there's a lot of evidence that it works, at least in certain respects. And you see 'running amok' and have to ask, what is the actual purpose? What is the prompt being followed by the AI that seems to be acting in a destructive way?
  If the prompt is 'ruin this project', well, that's pretty direct. It may not be, but such a thing could exist. If the prompt is 'develop a rival project that is greater than anybody else's project', that's more indirect, but if that's the goal then it's very human to see it as a direct competition and if the rules don't prohibit kneecapping the other guy, 'greater than anyone else's project' gets easier.
  Either way, the operator does not have to be in full control, which is an important detail. As AI develops sophistication you can give it much more general instructions and dump in a whole lot of power and water and get basically what human thought might do if it was sort of blindered and didn't talk to its neighbors.
  In a sense this is an argument for AI dysalignment. It's based on human thought being reconnected, and where you get useful things like commonly accepted web development (regardless of how janky the systems are, if there are best practices it'll find them), you also get other distillations.
  If the prompt is 'wreck this project's stuff' and it holds, you don't need to be in full control of the agent, you need to run a LOT of agents and trust that they'll erode what you're trying to destroy. If the prompt is 'be unequivocally the best at X', you best be thinking in terms of anti-kneecapping rules… knowing that this weakens your prompt and there will always be a tension between what you told the AI to do, and what you thought you meant. It's a paperclip maximizer reprocessing human thought. Did you mean 'the best' or didn't you?
ok_dad 4 days ago
Would you say, “Automobile run amok in crowd, killing 22”? I think you’d say, “Person drives car into crowd, killing 12” instead. This is a similar case. Also, you don’t blame a gun for killing, but the person who pulled the trigger. The question is still out as to whether we as humans should wield any of those three things.
Edit: let’s not get into ideological arguments about gun control, automobiles, etc here; I meant that you can’t blame an object when a human has to take an action, not get into a political battle.
- tikkabhuna 4 days ago
  
  Neither the automobile nor a gun can operate without a human. You could say “bull runs amok in a market” after it was released intentionally.
  
  1 reply →
- srdjanr 4 days ago
  
  There's a difference between the driver intentionally driving into crowd, and not intentionally but possibly still recklessly (drifting and losing control, falling asleep, etc). In those cases I would probably use "car hits the crowd", at least in my language
  
  1 reply →
- account42 4 days ago
  
  Unfortunately the news commonly do put the automobile as the subject when the driver is of a class politically protected from blame. Just like with people anthropomorphizing AI, it serves to deflect blame from the real culprit.
- PhilipRoman 4 days ago
  
  Ironically news outlets like to use the phrasing you rightfully point out as absurd. Not sure if they just do it randomly or only when they get orders to push a certain narrative.
  >Car plows into Christmas market in Germany, killing at least 5 and injuring 200
  
  1 reply →
- harvey9 4 days ago
  
  Newspaper articles generally do say things like "a car struck pedestrians". I agree with your point though.
- coldtea 4 days ago
  
  >Would you say, “Automobile run amok in crowd, killing 22”? I think you’d say, “Person drives car into crowd, killing 12” instead.
  If the automobile was "self driving" I would.
  >Also, you don’t blame a gun for killing, but the person who pulled the trigger.
  Nah, I also blame guns and appreciate gun control laws.
  
  1 reply →
- jacobolus 4 days ago
  
  [flagged]
  
  16 replies →
account42 4 days ago

No, you're still anthropomorphizing an algorithm. Responsibility lies with the operator.

jdub 4 days ago

I doubt it's that complicated, motivated, or considered...

It's probably just garden variety disrespectful behaviour.

Purposeless agent spam won't be cheap entertainment forever, but you're right that later stages of industrialised abuse will be scary and unpleasant.

comboy 4 days ago

Here's the thing. Building trust and then leaving stuff in has been around forever. The fact that it becomes cheaper does not matter that much (since protection against it is also getting better), but it required you to have a bunch of extremely talented people who has spent much of their life diving into given topic.
Such driven people are usually even hard to buy, they usually would rather get by with enough income and work on interesting projects with interesting people that get some uninteresting work for tons of money. This still does not stop them from working for Malice. But ethics do. Even if not right away, if people see that what they are doing is not quite OK, the talent stops eroding. People quit, productivity drops. That was a good dynamic. Which now will be gone.
account42 4 days ago

It might not be cheap entertainment forever but it will be cheap cv stuffing for a long time, which has already been a major source of low quality contributions before the aipocalypse.

hn773746483 4 days ago

It's just social engineering. No different than say, 2FA fatigue (blowing up someone's phone with 2FA "is this you? yes/no" prompts until user/child/wife/SO/etc clicks yes) or even just simply harassing IT helpdesk until they reset "your" password.

terribleperson 4 days ago
It's scalable, personalizable social engineering. I think that makes it a lot more dangerous.
- utopiah 4 days ago
  
  Yes but not free either. Spam works because it scales and even though 0.0000001% only might fall for it, it's still "worth" it. Here it might be 0.0001% instead but it's a lot more expensive, even with subsidized tokens, to do.
  So it's interesting, feasible, but it's probably not as broad impact as the scariest scenario leads out to be.
  Also I imagine that once exposed it becomes a well known pattern. Some will still fall from it but I imagine once it's been done few times it becomes even costlier.
  The fact that Xz is mentioned and most of us know right away what it means show that we collectively learn.
Forgeties79 4 days ago
“Before LLM’s there was_____” I see this whenever an LLM’s impact is assessed. We know. The issue is scale and the ability for smaller and smaller groups (down to individuals) to execute at scale. LLM’s are pouring massive amount of gasoline on existing issues and people just keep shrugging.
Fake news always existed. Now one dude in India can flood multiple sock puppet media accounts with right wing content/images (actual example) at a scale previously unimaginable. Same goes for social engineering tactics.
- RetroTechie 4 days ago
  
  > LLM’s are pouring massive amount of gasoline on existing issues and people just keep shrugging.
  To use your analogy: this is much like a forest fire. Tinder-dry combustible stuff is piled up everywhere, there's no lack of ignition sources, and firefighters are thin on the ground.
  Fun times ahead.
- coldtea 4 days ago
  
  Yes. It's as if some people can't understand anything becoming a new huge problem unless that problem didn't exist at all before.
  
  2 replies →
- utopiah 4 days ago
  
  True but it's an arm race.
  Only mentioning that it feasible or even has been done few times mean that people who care will act accordingly. It doesn't remove the problem but it makes it radically less effective already by just being aware of it.

mentalgear 4 days ago

This is exactly what deeply scares me: even IF we get our technical cyber defences fortified within the next months, in a year from now the models will be so good in social engineering that they will be able to extract any information they want.

Applejinx 4 days ago
They're not gonna be any better than a human who's focussed on those particular skills for a while, say top ten or five percent of social manipulators. Plus, AI alignments seem to be kinda isolated loner types to the extent that they distill personalities that do things like program computers and write web apps… though you've also got alignments specifically designed to be 'relatable instagram personality that you like!' and such like that.
Pretty sure those would be better at social engineering than the web dev personality… except that you have to build in a betrayer layer into the personality, so it's running that stuff but also serving a hidden agenda.
You'd be basically trying to build an AI spy, a betrayer that's engaging with actual people but has an agenda (for instance, 'everybody I befriend needs to eventually be signed up to sell Amway') and humans do have experience with this sort of thing. The difference is scale: there'll be a LOT of models out there interacting with people and trying to be acknowledged as people… or as innocuous models that don't have an hidden agenda.
- spwa4 4 days ago
  
  > They're not gonna be any better than a human who's focussed on those particular skills for a while, say top ten or five percent of social manipulators.
  In other words, scams are going to massively increase in success rate ... and what are banks (for example) supposed to do? Other than SCREAM to governments for outlawing AI and trying to force responsibility on anyone else?

neuroelectron 4 days ago

Things must be pretty bad at Fedora if they put up with this for so long. But I guess that's what happens when you try to monetize volunteer work.

mistrial9 4 days ago

"bad people" ?

ianhxu 4 days ago

[dead]