Comment by er4hn

1 month ago

I think the author makes some interesting points, but I'm not that worried about this. These tools feel symmetric for defenders to use as well. There's an easy to see path that involves running "LLM Red Teams" in CI before merging code or major releases. The fact that it's a somewhat time expensive (I'm ignoring cost here on purpose) test makes it feel similar to fuzzing for where it would fit in a pipeline. New tools, new threats, new solutions.

30 comments

er4hn

digdugdirk 1 month ago

That's not how complex systems work though? You say that these tools feel "symmetric" for defenders to use, but having both sides use the same tools immediately puts the defenders at a disadvantage in the "asymmetric warfare" context.

The defensive side needs everything to go right, all the time. The offensive side only needs something to go wrong once.

Vetch 1 month ago

I'm not sure that's the fully right mental model to use. They're not searching randomly with unbounded compute nor selecting from arbitrary strategies in this example. They are both using LLMs and likely the same ones, so will likely uncover overlapping possible solutions. Avoiding that depends on exploring more of the tail of the highly correlated to possibly identical distributions.
It's a subtle difference from what you said in that it's not like everything has to go right in a sequence for the defensive side, defenders just have to hope they committed enough into searching such that the offensive side has a significantly lowered chance of finding solutions they did not. Both the attackers and defenders are attacking a target program and sampling the same distribution for attacks, it's just that the defender is also iterating on patching any found exploits until their budget is exhausted.
psychoslave 1 month ago

That really depends of the offensive class. If that is a single group with some agenda, then that's just everyone spending much resources on creating solution no permanent actor in the game want actually to escalate into, just show they have the tools and skills.
It's probably more worrying as you get script kiddies on steroids which can spawn all around with same mindset as even the dumbest significant geopolitical actor out there.

NitpickLawyer 1 month ago

> These tools feel symmetric for defenders to use as well.

I don't think so. From a pure mathematical standpoint, you'd need better (or equal) results at avg@1 or maj@x, while the attacker needs just pass@x to succeed. That is, the red agent needs to work just once, while the blue agent needs to work all the time. Current agents are much better (20-30%) at pass@x than maj@x.

In real life that's why you sometimes see titles like "teenager hacks into multi-billion dollar company and installs crypto malware".

I do think that you're right in that we'll see improved security stance by using red v. blue agents "in a loop". But I also think that red has a mathematical advantage here.

rightbyte 1 month ago
>> These tools feel symmetric for defenders to use as well.
> I don't think so. From a pure mathematical standpoint, you'd need better (or equal) results at avg@1 or maj@x, while the attacker needs just pass@x to succeed.
Executing remote code is a choice not some sort of force of nature.
Timesharing systems are inherently not safe and way too much effort is put into claiming the stone from Sisyphus.
SaaS and complex centralized software need to go and that is way over due.
- saagarjha 1 month ago
  
  Awesome! What’s your strategy for migration of the entire world’s infrastructure to whatever you’re thinking about?
  
  4 replies →

azakai 1 month ago

Yes, and these tools are already being used defensively, e.g. in Google Big Sleep

https://projectzero.google/2024/10/from-naptime-to-big-sleep...

List of vulnerabilities found so far:

https://issuetracker.google.com/savedsearches/7155917

pizlonator 1 month ago

Not symmetric at all.

There are countless bugs to fund.

If the offender runs these tools, then any bug they find becomes a cyberweapon.

If the defender runs these tools, they will not thwart the offender unless they find and fix all of the bugs.

Any vs all is not symmetric

energy123 1 month ago
LLMs effectively move us from A to B:
A) 1 cyber security employee, 1 determined attacker
B) 100 cyber security employees, 100 determined attackers
Which is better for defender?
- pizlonator 1 month ago
  
  Neither
0xDEAFBEAD 1 month ago
How do bug bounties change the calculus? Assuming rational white hats who will report every bug which costs fewer LLM tokens than the bounty, on expectation.
- pizlonator 1 month ago
  
  They don’t.
  For the calculus to change, anyone running an LLM to find bugs would have to be able to find all of the bugs that anyone else running an LLM could ever find.
  That’s not going to happen.
  
  4 replies →

hackyhacky 1 month ago

> I think the author makes some interesting points, but I'm not that worried about this.

Given the large number of unmaintained or non-recent software out there, I think being worried is the right approach.

The only guaranteed winner is the LLM companies, who get to sell tokens to both sides.

pixl97 1 month ago

I mean you're leaving out large nation state entities

0xbadcafebee 1 month ago

An LLM Red Team is going to be too expensive most people; an actual infosec company will need to write the prompts, vet them, etc. But you don't need that to find exploits if you're just a human sitting at a console trying things. The hackers still have the massive advantage of 1) time, 2) cost (it will cost them less than the defenders/Red-Team-As-a-SaaS), and 3) they only have to get lucky once.

SchemaLoad 1 month ago

This + the fact software and hardware has been getting structurally more secure over time. New changes like language safety features, Memory Integrity Enforcement, etc will significantly raise the bar on the difficulty to find exploits.

amelius 1 month ago

> These tools feel symmetric for defenders to use as well.

Why? The attackers can run the defending software as well. As such they can test millions of testcases, and if one breaks through the defenses they can make it go live.

er4hn 1 month ago

Right, that's the same situation as fuzz testing today, which is why I compared it. I feel like you're gesturing towards "Attackers only need to get lucky once, defenders need to do a good job everytime" but a lot of the times when you apply techniques like fuzz testing it doesn't take a lot of effort to get good coverage. I suspect a similar situation will play out with LLM assisted attack generation. For higher value targets based on OSS, there's projects like Google Big Sleep to bring enhanced resources.
execveat 1 month ago

Defenders have threat modeling on their side. With access to source code and design docs, configs, infra, actual requirements and ability to redesign / choose the architecture and dependencies for the job, etc - there's a lot that actually gives defending side an advantage.
I'm quite optimistic about AI ultimately making systems more secure and well protected, shifting the overall balance towards the defenders.

lateral_cloud 1 month ago

Defenders have the added complexity of operating within business constraints like CAB/change control and uptime requirements. Threat actors don’t, so they can move quick and operate at scale.

bandrami 1 month ago

For that matter is this in principle much different from a fuzzer?