Comment by nomilk

19 hours ago

Pre-AI, having access to code (e.g. if it leaked or even just open source) could allow hackers to more easily discover exploits. I wonder if that threat is now much more severe in the age of AI. Thankfully GitHub have probably themselves run their code through many AI security tools so any vulnerabilities would have already been found and patched. Hopefully.

As a developer or security researcher, you're able to download and run GitHub Enterprise Server. I'm not sure having access to the full source code makes a meaningful difference for most of GitHub's surface area, given it's largely Ruby.

  • LLMs can't really parse compiled code to find exploits, maybe code in scripting languages (python, js, etc) even if minified. So I don't quite agree with you, having access to the source can definitely help find exploits even in pre-LLM days.

    • Also, the Github enterprise code is "obfuscated" but it uses a trivially reversible method just meant to be a minor roadblock. After you get past that you get the full ruby source code, no minification or anything.

      For a while the key was literally:

      > This obfuscation is intended to discourage GitHub Enterprise customers from making modifications to the VM. We know this 'encryption' is easily broken.

    • Pretty much everyone disagrees with you, especially when you add in decompiler tools to the LLM.

    • how to say you haven't tried llms since 2023 without saying it, that's quite literally one of the things they excel at

I just had a disturbing thought. What if the LLM providers start blocklisting certain codebases?

“I’m sorry Dave, I can’t do that. This codebase has been identified as proprietary.”

  • Oh! That looks like a nail to me... why not check all file hashes against a CP database? That database could be maintained by, well, some government agency, but don't worry about it, it's gonna be super secure and there's no abuse potential or something at all!

> I wonder if that threat is now much more severe in the age of AI.

It is. I've been using Codex to analyse repositories en masse for a project I'm working on now[0]. Codex, Claude (my usual weapon of choice), etc., make pretty short work of looking for all kinds of problems and antipatterns in large codebases.

[0] Before any wags chime in, no, I'm not the one who hacked Nx and exported 4000 internal GitHub repos. I'm talking about a legitimate client project for a reputable company!