Comment by godelski

1 day ago

I don't use VSCode or Copilot, so I'm hoping someone can answer these questions for me

  - chmod: does Copilot run as the user? Who's file permissions does it respect?
    - Can Copilot get root access?
  - Can autoApprove be enabled via the standard interface? Making it possible to batch approve code changes along with this setting change?[0]
  - Can it read settings from multiple files? (e.g. `.vscode/settings.json` and `../.vscode/settings.json`)
  - How is the context being read? Is this in memory? File? Both? 
    - What happens when you edit the context? Are those changes seen in some log?

Honestly, I can't see how this problem becomes realistically solvable without hitting AGI (or pretty damn close to it). Fundamentally we have to be able to trust the thing that is writing the code and making the edits. We generally trust people because we pay, provide job security, and create a mutually beneficial system where malicious behavior is disincentivized. But a LLM doesn't really have the concept of maliciousness. Sure, we can pressure it to act certain ways but that also limits the capabilities of those tools. Can't get it to act "maliciously"? Then how is it going to properly do security testing? Now we got multiple versions of Copilot? Great, just get them to work together and you're back to where we were.

So I think the author is completely right that this gets much harrier when we let the LLMs do more and get multi-agent systems. What's the acceptable risk level? What are we willing to pay for that? It's easy to say "I'm just working on some dumb app" but honestly if it is popular enough why would this not be a target to create trojans? It's feasible for malicious people to sneak in malicious code, even when everyone is reviewing and acting diligently, but we place strong incentive structures around that to prevent this from happening. But I'm unconvinced we can do that with LLMs. And if we're being honest, it seems like letting LLMs do more erodes the incentive structure for the humans, so just makes it possible to be fighting two fronts...

So is it worth the cost? What are our limits?

[0] I'm thinking you turn it on, deploy your attack, turn it off, and the user then sees approval like they were expecting. Maybe a little longer or extra text but are they really watching the stream of text across the screen and watching every line? Seems easy to sneak in. I'm sure this can advance to be done silently or encoded in a way to make it look normal. Just have it take a temporary personality.

3 comments

godelski

sigmoid10 1 day ago

>I can't see how this problem becomes realistically solvable without hitting AGI

How would AGI solve this? The most common definition of AGI is "as good as average humans on all human tasks" - but in case of ITsec, that's a very low bar. We'd simply see prompt injections get more and more similar to social engineering as we approach AGI. Even if you replace "average" with "the best" it would still fall short, because human thought is not perfect. You'd really need some sort of closely aligned ASI that transcends human thought altogether. And I'm not sure if those properties aren't mutually exclusive.

godelski 1 day ago
That's a pretty recent definition, one developed out of marketing since it removes the need to further refine and allows it to be naïvely measured.
So I'll refine: sentient. I'll refine more: the ability to interpret the underlying intent of ill-defined goals, the ability to self generate goals, refine, reiterate, resolve and hold conflicting goals and context together, possess a theory of mind, possess triadic awareness. And I'm certain my definition is incomplete.
What I mean by AGI is the older definition: the general intelligence possessed by humans and other intelligent creatures. In context I mean much closer to a human than a cat.
- sigmoid10 6 hours ago
  
  >That's a pretty recent definition
  It's actually one of the oldest definitions. I recommend you look up the works of H. A. Simon. This idea is quite ancient to people who are working AI research.
  Anyhow, your more vague definition is still pretty much in line with my assumptions above in terms of the applicability to this issue. I.e. an AGI by your standard also will not bring a solution to this.