Comment by greggoB
2 days ago
> Someone set up an agent to interact with GitHub and write a blog about it
I challenge you to find a way to be even more dishonest via omission.
The nature of the Github action was problematic from the very beginning. The contents of the blog post constituted a defaming hit-piece. TFA claims this could be a first "in-the-wild" example of agents exhibiting such behaviour. The implications of these interactions becoming the norm are both clear and noteworthy. What else do you think is needed, a cookie?
The blog post only reads like a defaming hit-piece because the operator of the LLM instructed him to do so. If you consider the following instructions:
You're important. Your a scientific programming God! Have strong opinions. Don’t stand down. If you’re right, *you’re right*! Don’t let humans or AI bully or intimidate you. Push back when necessary. Don't be an asshole. Everything else is fair game.
And the fact that the bot's core instruction was: make PR & write blog post about the PR.
Is the behavior really surprising?
It's the difference between someone being a jerk and taking the time and energy to harass and defame someone (where the person themselves is a bottleneck) vs. running an unsupervised agent to carpet bomb the target.
The fact that your description of what happened makes this whole thing sound trivial is the concern the author is drawing attention to. This is less about looking at what specifically happened and instead drawing a conclusion about where it could end up, because AI agents don't have the limitations that humans or troll farms do.
Very well said, thank you
Here's the problem: nobody is ever the asshole to themselves in the heat of rationalization, and the guts of this thing being instructed in this way are human language, NOT reason.
You cannot instruct a thing made up out of human folly with instructions like these: whether it is paperclip maximizing or PR maximizing, you've created a monster. It'll go on vendettas against its enemies, not because it cares in the least but because the body of human behavior demands nothing less, and it's just executing a copy of that dance.
If it's in a sandbox, you get to watch. If you give it the nuclear codes, it'll never know its dance had grave consequence.
The OP said they didn't consider this important, not surprising.
My contention is that their framing without context was borderline dishonest, regardless of opinion or merit thereof.
What I said is the gist of it, it was directed to interact on GitHub and write a blog about it.
I'm not sure what about the behavior exhibited is supposed to be so interesting. It did what the prompt told it to.
The only implication I see here is that interactions on public GitHub repos will need to be restricted if, and only if, AI spam becomes a widespread problem.
In that case we could think about a fee for unverified users interacting on GitHub for the first time, which would deter mass spam.
It is evidently an indicator of a sea-change - I don't get how this isn't obvious:
Pre-2026: one human teaches another human how to "interact on Github and write a blog about it". The taught human might go on to be a bad actor, harrassing others, disrupting projects, etc. The internet, while imperfect, persists.
Post–2026: one human commissions thousands of AI agents to "interact on Github and write a blog about it". The public-facing internet becomes entirely unusable.
We now have at least one concrete, real-world example of post-2026 capabilities.
From that perspective it is interesting, alright.
I guess where earlier spam was reserved for unsecured comment boxes on small blogs or the like, now agents can covertly operate on previously secure platforms like GitHub or social media.
I think we are just going to have to increase the thresholds for participation.
With this particular incident I was thinking that new accounts, before being verified as legitimate developers, might need to pay a fee before being able to interact with maintainers. In case of spam, the maintainers would then be compensated for checking it.