Anyone can access deleted and private repository data on GitHub

2 years ago (trufflesecurity.com)

I reported this on their HackerOne many years ago (2018 it seems) and they said it was working as intended. Conclusion: don't use private forks. Copy the repository instead.

Here is their full response from back then:

> Thanks for the submission! We have reviewed your report and validated your findings. After internally assessing the finding we have determined it is a known low risk issue. We may make this functionality more strict in the future, but don't have anything to announce now. As a result, this is not eligible for reward under the Bug Bounty program.

> GitHub stores the parent repository along with forks in a "repository network". It is a known behavior that objects from one network member are readable via other network members. Blobs and commits are stored together, while refs are stored separately for each fork. This shared storage model is what allows for pull requests between members of the same network. When a repository's visibility changes (Eg. public->private) we remove it from the network to prevent private commits/blobs from being readable via another network member.

  • There seems to be no such thing as a "private fork" on GitHub in 2024 [1]:

    > A fork is a new repository that shares code and visibility settings with the upstream repository. All forks of public repositories are public. You cannot change the visibility of a fork.

    [1] https://docs.github.com/en/pull-requests/collaborating-with-...

  • Honest question. Submitting these types of bugs only to get a: "we have determined it is known low risk issue..." seems like they really don't want to pay for someone else's time and dedication in making their product safer. If they knew about this, was this disclosed somewhere? If not I don't see them playing a fair game. What's the motivation to do this if in the end they can have the final decision to award you or not? To me it looks like similar to what happens with Google Play/Apple store to decide whether or not an app can be uploaded/distributed through them.

    Edit: I popped this up because to me is absolutely miserable from a big company to just say: "Thanks, but we were aware of this".

    • Not defending GH here (their position is indefensible imo) but, as the article notes, they document these behaviors clearly and publicly:

      https://docs.github.com/en/pull-requests/collaborating-with-...

      I don't think they're being underhanded exactly... they're just making a terrible decision. Quoting from the article:

      > The average user views the separation of private and public repositories as a security boundary, and understandably believes that any data located in a private repository cannot be accessed by public users. Unfortunately, as we documented above, that is not always true. Whatsmore, the act of deletion implies the destruction of data. As we saw above, deleting a repository or fork does not mean your commit data is actually deleted.

      38 replies →

    • No large company running a bug bounty cares one iota about stiffing you on a bounty payment. The teams running this programs are internally incentivized to maximize payouts; the payouts are evidence that the system is working. If you're denied a payment --- for a large company, at least --- there's something else going on.

      The thing to keep in mind is that large-scale bug bounty programs make their own incentive weather. People game the hell out of them. If you ack and fix sev:info bugs, people submit lots more sev:info bugs, and now your security program has been reoriented around the dumbest bugs --- the opposite of what you want a bounty program to do.

      5 replies →

    • As the article pointed out, GitHub already publicly documented this vulnerability.

      My employer doesn't pay out for known security issues, especially if we have mitigating controls.

      A lot of people spam us with vulnerability reports from security tools we already use. At least half of them turn out to be false positives we are already aware of. In my opinion, running a bug bounty program at all is a net negative for us. We aren't large enough to get the attention of anyone competent.

      13 replies →

    • Security disclosures are like giving someone an unsolicited gift. The receiver is obligated to return the favor.

      But if you buy someone non-refundable tickets to a concert they already have tickets for, you aren't owed compensation.

      3 replies →

    • Disagree. This is obviously a deliberate design choice with obvious implications. Expecting a bounty for reporting this is unreasonable. These kind of beg bounties are exactly what gives security "researchers" a bad name.

      The security implications are also minor. The only problem really is with making a fork of a private repo public - that should only make what exists in that fork public and not any other objects. Something that was already public staying public even when you delete it from your repo is not a security issue at all. Keys you have ever been pushed to a public repo should be revoked no matter what, with or without this GitGub feature.

      2 replies →

    • For moral reasons, historically I never wrote POCs or threatened disclosure.

      For companies like Microsoft, which a CSRB audit showed that their security culture 'inadequate', the risk of disclosure with a POC is about the only tool we have to enforce their side of the Shared Responsibility Model.

      Even the largest IT spender in the world, the US government has moved more from the carrot to the stick model. If they have to do it so do we.

      Unfortunately as publishing a 'bad practices' list by us doesn't invoke the risk of EULA busting gross negligence claims, responsible disclosure is one of the few tools we have.

    • It's not just GitHub and it's not just because they don't want to pay bug hunters. In my career, I have escalated multiple bugs to my employer(s) in which the response was 'working as intended'. And they wouldn't have to pay me another cent if they acknowledged the issue.

      In my experience, there was two reasons for this behavior:

      1. They don't want to spin dev cycles on something that isn't directly related to revenue (e.g. security) 2. Developers don't have the same mindset as someone who's whole job is security. So they think something is fine when it's really not.

    • I didn't find anything mentioning it online at the time. But there wasn't much time and dedication involved either, to be fair. I discovered it completely on accident when I combined a commit hash from my local client with the wrong repository url and it ended up working.

    • The issue had been reported at least twice and was clearly documented. GitHub knew about this and had known for years. Their replies to the two notifications were even very similar.

      GitHub clearly knew. Would you prefer that a vendor lie?

    • companies vary wildly in their honesty and cooperation with bug bounties and develop reputations as a result. if they have a shit reputation, people stop doing free work for them and instead focus on more honest companies

      2 replies →

  • I reported a different security issue to github, and they responded the same (although they ultimately ended up fixing it when I told them I was going to blog about the "intended behavior").

  • What does "private fork" mean in this context? I created a fork of a project by cloning it to my own machine and set origin to an empty private repository on GitHub. I manually merge upstream changes on my machine.

    Is my repository accessible?

    • It’s not. The feature here works because a network of forks known by GitHub has a unified storage, that’s what makes things like PRs work transparently and keep working if you delete the fork (kinda, it closes the PR but the contents don’t change).

    • No, that would be the "copy the repository" approach. Private fork is when you do it through their UI.

      As far as I know, it is not accessible.

    • then it's fine

      the issue is the `fork` mechanism of github is not semantically like a `git clone`

      it's more like creating a larger git repo in which all forks weather private or not are contained and which doesn't properly implement access management (at least point 2&3 wouldn't be an issue if they did)

      there are also some implications form point 1 that forks do in some way infer with gc-ing orphan commits (e.g. the non synced commits in he deleted repo in point 1) at least that should be a bug IMHO one which also costs them storage

      (also to be clear for me 2&3 are security vulnerabilities no matter if they are classified as intended behavior)

    • Because you never git pushed to the fork it's not aware of your repo, you're ok.

      What I don't know is if in 3 months you DO set your remote origin to that fork to for instance, pull upstream patches into your private repo, you're still not pushing, only pulling, so I would THINK they'd still never get your changes, but I don't know if git does some sort of log sync when you do a pull as well.

      Maybe that would wind up having the commit hash available.

  • Imho there is an issue with the word "delete". Apparently for anyone who is hosting someone else's (private and/or sensitive and/or worthy) data is to hide it from view, but keep it around "just in case" or "because we can" or "what are you gonna do about it"?

    I 'love' it when I see the words "hide", "archive", "remove", and other newspeak to avoid using the word "delete", since 'they' never actually delete (plus there are 1-2-5-10-forever years' of backups where your 'deleted' info can be retrieved relatively easy).

  • It would not even be that hard to fix it; private forks should always just be automatically copied on first write. You might lose your little link to the original repo, but that's not as bad as unintentionally exposing all your future content.

  • Same, September 2018 for me.

    > After some internal discussion, we have determined this is a known low risk issue. We may make this functionality more strict in the future, but don't have anything to announce now. As a result, this is not eligible for reward under the Bug Bounty program. Below is a reference to our instructions for users to remove sensitive data from a repository. https://help.github.com/articles/removing-sensitive-data-fro...

  • > Conclusion: don't use private forks. Copy the repository instead.

    My conclusion would be: don’t use GitHub.

  • To be fair, in the true git sense, if a "fork" is really just a branch, deleting the original completely would also mean deleting every branch (fork) completely

    obviously not a fan of this policy though

    • But a fork is really not a branch. it’s a copy of a repo with one remote pointing at the original on github but that doesn’t need to happen.

Users should never be expected to know these gotchas for a feature called "private", documented or not. It's disappointing to see GitHub calling it a feature instead of a bug, to me it just shows a complete lack of care about security. Privacy features should _always_ have a strict, safe default.

In the meantime I'll be calling "private" repos "unlisted", seems more appropriate

  • Yep, I see GitHub as "public only" hosting, and if I want to host something private, I will choose another vendor.

    • For the benefit of anybody thinking "with gitlab I'm safe from this": If you're saying (and perhaps you're not) that some other git hosting service

      - gives you control over gc-ing their hosted remote?

      - does not to your knowledge have a third-party public reflog or an events API or brute-forceable short hashes?

      if so, especially the second of those seems a fragile assumption, because this is "just" the way git works (I'm not saying the consequences aren't easy to mentally gloss over). Even if gitlab lacks those things curently (but I think for example it does support short hashes), it's easy to imagine them showing up somehow retroactively.

      If you're just agreeing with the grandparent post that github's naming ("private") is misleading or that the fork feature encourages this mistake: agreed.

      Curious to know if any git hosting service does support gc-ing under user control.

    • > if I want to host something private, I will choose another vendor.

      Or you know, self-host, preferrably on-prem.

      Basic git hosting only needs a sshd running on the server. If you want collaborative features with a web UI then there are solutions for that available too.

  • > I'll be calling "private" repos "unlisted"

    That might be a bit too strict. I'd still expect my private repos (no forks involved) to be private, unless we discover another footnote in GH's docs in a few years ¯\_(ツ)_/¯

    But I'll forget about using forks except for publicly contributing to public repos.

    > Users should never be expected to know these gotchas for a feature called "private".

    Yes, the principle of least astonishment[0] should apply to security as well.

    [0] https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...

  • Specifically about the feature called "private", the only gotcha seems to be that when the upstream transitions from private to public, it may unexpectedly take more data public than desired, right? The other discussed gotchas were all about deleting public data not actually being deleted or inaccessible.

  • I see your point, on the other hand, the standard procedure for that on GitHub UI is to create a repo and then select another as a template.

    That doesn't fork, but does what you would expect, a fully private repo.

  • > It's disappointing to see GitHub calling it a feature instead of a bug

    git is a "distributed" version control software afterall. It means a peer can't control everything.

  • Disagree. If you're using a service, understand how it works.

    Not everything needs to be designed for idiots and lazy people, it's ok for some tools and services, especially those aimed at technical people and engineers to require reading to use properly and to be surprising or unintuitive at first glance.

    • There's got to be a word for these kinds of ridiculous arguments which use personal responsibility as a cudgel against a systematic fix.

      I agree generally that interfaces have been dumbing down too far, but "private is actually not private and it's on you for not knowing that, idiot B)" is a weird place to be planting that flag.

      2 replies →

The biggest gotcha here is probably that if you start of with a private repo and a private fork, making the repo public also makes the fork "public".

GitHub may very well say that this is working as intended, but if it truly is then you should be forced to make both the repo and fork public at the same time.

Essentially "Making repo R public will make the following forks public as well 'My Fork', 'Super secret fork', 'Fork that I deleted because it contained the password to my neighbours wifi :P'.

OK. I'm not sure if the last one would actually be public, but I wouldn't be surprised if that was "Working as intended(TM)" - GitHub SecOps

  • Any time you make a private repo public it’s best to just copy that code into a new public repo and leave the private repo private. Otherwise have to audit every previous commit and every commit on every fork of your private code.

  • I agree. The other cases may be mildly surprising, but ultimately fall firmly into the category of "once public on the internet, always public." Deleting a repo or fork or commit doesn't revoke an access key that was accidentally committed, and an access key being public for even a microsecond should be assumed to have been scraped and usable by a malicious actor.

    • If you have a private repo, you would assume that nothing in that private repo becomes public unless you do something very explicit.

      The issue here is that if you have a private repo and a private fork of that repo. If you make the private repo public and keep the fork private, then you are not explicitly told that your fork is actually public, whether you want to or not.

  • it's a bit of that you have to know the sha, and that's quite unique. it's apparently unique enough for Google photos to "private" share without logins

    • You only need the short SHA, which is 4 characters. Brute force ends up becoming very easy with 65k possibilities.

Surprised at the comments minimizing this.

I've used github for a long time, would not have expected these results, and was unnerved by them.

I'd recommend reading the article yourself. It does a good job explaining the vulnerabilities.

  • For the first two, git is based on content addressable storage, so it makes sense that anything that is ever public will never disappear.

    I can sympathize with someone who gets bit by it, as it might not have occurred to them, but it’s part of the model.

    The third strikes me as counter-intuitive and hard to reason about.

    P.S. If you publish your keys or access tokens for well known services to GitHub and you are prominent enough, they will be found and exploited in minutes. The idea that deleting the repository is a security measure is not really worth taking seriously.

    • > For the first two, git is based on content addressable storage, so it makes sense that anything that is every public will never disappear.

      this isn't quite right

      content addressable storage is just a mean of access it does

      - not imply content cannot be deleted

      - not imply content cannot be access managed

      you could apply this to a git repo itself (like making some branches private and some not) but more important forks are not git ops, they are more high level github ops and could very well have appropriate measurements to make sure this cannot happen

      e.g. if github had implemented forks like a `git clone` _non of this vulnerabilities would have been a thing_

      similar implemented different access rights for different subsets of fork networks (or even the same git repo) technically isn't a problem either (not trivial but quite doable)

      and I mean commits made to private repositories being public is always a security vulnerability no matter how much github claims it's intended

      2 replies →

    • I agree the 3rd is by far the worst of the offenders. But even the first two should have more visibility. For example, by notifying users during deletion of forked repos that data will still be available.

      The exact UX here is debatable, but I don't think security warnings buried in the docs is enough. They should be accounting for likely misunderstandings of the model.

      4 replies →

    • > For the first two, git is based on content addressable storage, so it makes sense that anything that is ever public will never disappear.

      No one can, with a straight face, say that they don’t restrict access because “this is just how the technology works”. Doesn’t matter if it is content addressable or an append-only FS or whatever else.

      Even for some technology where the data lives forever somewhere (it doesn’t according to Git; GitHub has a system which keeps non-transitively referenced commits from being garbage collected), the non-crazy thing is to put access policy logic behind the raw storage fetch.

    • > git is based on content addressable storage, so it makes sense that anything that is every public will never disappear.

      No. That doesn't make sense. It only sounds vaguely plausible at first because content addressable storage often means a distributed system where hosting nodes are controlled by multiple parties. That's not the case here, we're only talking about one host.

      Imagine we were talking about a (hypothetical) NetFlix CDN where it's content addressed rather than by UUID. Would anyone say "they forgot to check auth tokens for Frozen for one day, therefore it makes sense that everyone can watch it for free forever"?

      2 replies →

  • > I've used github for a long time, would not have expected these results, and was unnerved by them.

    So you've used it heavily, but haven't read the docs or thought about how forks work, and are now surprised. This seems like a learning opportunity, read the docs for stuff you use heavily, read the man pages and info pages for tools you rely on.

    None of this seemed surprising to me, perhaps because I've made PRs, seen that PRs from deleted repositories are still visible, and generally have this mental model of "a repository fork is part of a network of forks, which is a shared collection of git objects".

    • Congratulations, you developed the right intuition.

      However in UX/DX the question isn't whether users can develop the right intuition based on how they interact with software over time and reading through the documentation but how to shorten the time and effort necessary for that, ideally so that a single glance is enough.

      Do you think reading all the documentation for every feature of every tool you use in your life is a good use of your time and something that should be expected of everyone? As someone developing software used by other people, I don't.

  • The mental gymnastics going on in this thread to justify this as a sane design is likely why software sucks more and more these days.

IMO, the real vulnerability here is the way the Github Events archive exposes the SHA1 hashes of the vulnerable repositories. It would be easy to trawl the entire network to access these deleted/private repositories, but only because they have a list of them.

Similar (but less concerning) is the ability to use short SHA1 hashes. You'd have to either be targeting a particular repository (for example, one for which a malicious actor can expect users to follow the tutorial and commit API keys or other private data) or be targeting a particular individual with a public repository who you suspect might have linked private repositories. It's not free to guess something like "07f01e", but not hard either.

If these links still worked exactly the same, but (1) you had to guess 07f01e8337c1073d2c45bb12d688170fcd44c637 and (2) there was no events API with which to look up that value, this would be much, much less impactful.

This is not new. Many people have noticed this before, e.g. https://hikari.noyu.me/blog/2020-05-05-github-private-repos-...

  • No but I think attention should still be raised to it in the hopes they will fix it. The squeaky wheel gets the grease.

    https://xkcd.com/1053

    • First step would be to have them acknowledge a documented behavior which was part of their original design 16 years ago, is something that needs to be fixed.

      As someone who has used git and GitHub extensively over that time, none of what the author documented was a surprise to me.

      However, I also remember when people were trained to do a "Save As" when preparing a final Word document or Powerpoint for sharing with a third party. That certainly bit enough business users that Microsoft eventually changed the default behavior.

      3 replies →

    • I love this xkcd.

      We all need to embrace: Nobody has ever been impressed that you already knew something. When people share a discovery with you, it’s not about you. It’s about them and their joy of discovery. They want to share that joy with you.

      1 reply →

Hubber here (same username on github.com). We in GitHub's OSPO have been working on an open source GitHub App to address the use case where organizations want to keep a private mirror of an upstream public fork so they can review code and remove IP/secrets/keys that get committed and squash history before any of those changes are made public. Getting a beta release this week, in fact - check it out, I'm curious what yall think about the approach

https://github.com/github-community-projects/private-mirrors

  • Looks like a promising tool and workflow to mitigate the risks we are discussing here. If you haven’t already done so, it might help the discussion here if you could highlight how this app deals with the issues outlined. Is the intent of the mirror repo creation that it’s more-or-less equivalent to “git clone —mirror”? I took a quick look at the code, and didn’t see a direct correspondence with “git clone —mirror” when creating the mirror repository.

    • That's correct, it's doing a clone into an empty repo rather than using the fork API - code is here: https://github.com/github-community-projects/private-mirrors...

      As it pertains to the post, since that private mirror is disconnected, none of the concerns about accessing deleted data apply.

      The downside is that you don't get any of GitHub's performance and UI affordances from the fork network. But for the use case of private iterations on work headed for a public upstream, that's a trade-off that seems worth making.

In fact, there is a process to request complete removal of data, but it involves sending an email that will be reviewed by github staff: https://docs.github.com/en/site-policy/content-removal-polic...

On the other hand, once an API key or password has been published somewhere, you should rotate it anyway.

  • I was wondering, how they can otherwise comply with legislation. Makes sense there is a way to do this e.g. in case of valid GDPR, DMCA, etc. cases.

    • Github's own DMCA reporting repo has warez in it from deleted PRs you can still access with the original link. Been that way for years

Can this be used to host illegal content? I.e.: fork a popular repo, commit a pirated book to the fork, delete the fork, use the original repo to access the pirated book?

What would github do after receiving a DMCA request in that case?

  • One can safely assume they will find a way to follow the law rather than mumble about technically this is working as intended.

    • > One can safely assume

      With something as nuance as this, I wouldn't safely assume all processes, especially one from a compliance (none-technical) department account for it.

  • I've seen bots make that kind of PR spam a few times. They'll make a PR that adds a random HTML or markdown file or whatever containing gambling spam or whatever and then presumably post links to github.com/$yourorg/$yourrepo/blob/$sha/thatfile I can't link an example because all the ones I know about were nuked by GH Support.

  • That looks like the kind of loophole that could get GH to do something about this.

    • they have the ability to do essentially git gc and drop unreachable commits

  • It can be used to make it look like another project posted the content (though there is a warning: "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.").

    You can't host anything this way that you can't already host in your own repository, and GitHub does have a way to remove content that will make it inaccessible, whether in your repository or through another.

  • >Can this be used to host illegal content?

    It already is. Even to github org's own repos. Any time you make a PR, the /tree/ link to it stays valid forever, even if the repo author removes it.

There's quite a long list of "open core" companies whose models are, start from a private repository (i.e. company is in stealth), make a private fork that will include for-profit code with enterprise features, make the original repository public so that the core will be open-source.

That GitHub is telling these companies, and bear in mind that these companies are paying customers of GitHub, yeah we don't care that your private proprietary code can be hacked off GitHub by anybody, is incredibly disturbing. Is there really not enough pressure from paying customers to fix this? Is Microsoft just too big to care?

In response to the end of the article "it’s important to note that some of these issues exist on other version control system products." I actually have experience helping someone with an issue on BitBucket with PII data that you can't rotate.

Once we eliminated the references in the tree and all forks (they were all private thankfully), we reached out to BitBucket support, and they were able to garbage collect those commits, and purge them to the point where even knowing the git hashes they were not locatable directly.

> The implication here is that any code committed to a public repository may be accessible forever

That's exactly how you should treat anything made available to the public (and there's no need for the subsequent qualifier that appears in the article—"as long as there is at least one fork of that repository").

  • Sometimes I wonder if all the security features GitHub slathers on top of `git` lull people into a false sense of security when fundamentally they're working in a fully distributed version control system with no centralized authority. If your key is leaked the solution is to invalidate the key not just synthetically alter your version of history to pretend it never happened.

    • Unless you specifically know and understand the ramifications of this GitHub idiosyncrasy, you have no way to tell that your key was possibly leaked. GitHub never informs you that someone accessed a commit created in your private fork.

      1 reply →

Most of this report is just noise. GitHub repos are public. Public stuff can be shared. Public stuff shared previously and then deleted is "still available", but it was shared previously and not really subject to security analysis.

The one thing they seem to be able to show is that commits in private branches show up in the parent repository if you know the SHAs. And that seems like a real vulnerability. But AFAICT it also requires that you know the commit IDs, which is not something you can get via brute forcing the API. You'd have to combine this with a secondary hole (like the ability to generate a git log, or exploiting a tool that lists its commit via ID in its own metadata, etc...).

Not nothing, but not "anyone can access private data on GitHub" as advertised.

  • > it also requires that you know the commit IDs, which is not something you can get via brute forcing the API

    Well, GitHub accepts abbreviations down to as short as four hex digits... as long as there's no collision with another commit, that's certainly feasible. Even if there is collision, once you have the first four characters you can just do a breadth-first search

  • There's a whole section here about how to brute force the hashs. You don't even need the full hash... just a shortened version using the first few chars.

    • I'm dubious. Searching for globally unique commit IDs is still a least a million+ request operation. That's easy enough in a cryptographic sense but the attack in question requires banging a web UI, which is 100% for sure going to hit some abuse detector. I really don't think you can do this in practice, and the article certainly doesn't demonstrate it.

      1 reply →

I maintain a pretty popular template for SaaS websites. Every few weeks someone would send a PR with all their private fork data, then quickly try to delete it.

Making it a "template" repo mostly fixed the issue. That creates a copy instead of a fork. However it still happens from time to time.

I think the first two points are a result of private data (commit/fork/issue) being able to refer to public data without making the reference public.

Say a private commit depends on a public commit C. Suppose in the public repo, the branch containing C gets deleted and C is no longer reachable from the root. From the public repo's point-of-view, C can be garbage-collected, but GitHub must keep it alive, otherwise the deletion will break the private commit.

It would be "a spooky action at a distance" from the private repo's POV. Since the data was at a time public, the private repo could have just backed up everything. In fact, if that's the case, everyone should always backup everything. GitHub retaining the commit achieves the same effect.

The public repo's owner can't prevent this breakage even if they want to, because there's no way to know the existence of this dependency.

The security issue discussed in the post is a different scenario, where the public repo's owner wants to break the dependency (making the commit no longer accessible). That would put too much of a risk for anyone to depend on any public code.

My mental model is that all commits ever submitted to GitHub will live forever and if it's public at one time, then it will always be publicly accessible via its commit hash.

Really the only semi-interesting part of this is "if you make a private repo public, data from other private forks might be discoverable", but even that seems pretty minor, and the best practice for taking private repos public is to copy the data into a new repo anyway.

  • Is that a best practice in hindsight, or because it was known to some, that this issue exists, or for what other reason do you consider it a best practice? Git history?

    • When making a private repo public, there's a high chance that there was stuff in the private repo that isn't necessarily ok to make public. It's a lot easier to just create a new public repo containing all the data you want to make public than it is to reliably scrub a private repo of any data that shouldn't be there.

      More generally, you probably want to construct a new history for the public repo anyway, so you'll want a brand new repo to ensure none of the scrubbed history is accessible.

    • I worked in Professional Services at AWS for a little over three years. There was a fairly easy approval process to put our work out on the public AWS Samples (https://github.com/aws-samples) repository once we removed the private confidential part of the implementation.

      I always started a new repository without git history. I can’t imagine trying to audit every single commit.

  • You’ve completely missed the most dangerous thing mentioned, namely that private forks are not private.

    • > You’ve completely missed the most dangerous thing mentioned, namely that private forks are not private.

      What do you mean "missed"? They described the situation where data is leaked from a private fork, which is when you make the original repo public.

      There's no other time when data leaks. A public repo can't have ongoing private forks.

  • Even after a private repo is made public, it's common practice for new functionality to be worked on in private until it's ready.

Clearly a POLA violation (principle of least astonishment)

  • So it using uncommon acronyms when you only referencing the thing once.

    • I found the post you're replying to helpful (and it made me laugh): I've come across the abbreviation POLA many times, with its non-jokey meaning "principle of least authority". I've also come across "principle of least astonishment" (Larry Wall or some other Perl contributor maybe?) but I'd never noticed that was (presumably?) a jokey reference to principle of least authority - I guess because I came across the joke first back was I was barely a programmer and I've never seen it abbreviated.

      But maybe it never was a reference to POLA proper - "principle of least privilege" is more widespread I think, outside of the object capability community. And maybe "least astonishment" came first!

Unrelated, but another interesting one is any non-admin contributors being able to add (and I believe update) secrets in a private repo for use in GH actions. It can’t be done via the UI, but can be done via the API or VSCode extension.

When I looked into it a while back, apparently it is intended behavior, which just seems odd.

>This is such an enormous attack vector for all organizations that use GitHub that we’re introducing a new term: Cross Fork Object Reference (CFOR)

Have we stopped naming vulnerabilities cute and fuzzy names and started inventing class names instead? Does this have a logo? Has this issue been identified anywhere else?

  • Introducing a new vulnerability... Git Forked™!

    chatgpt: Create a logo image of a fork impaling a small gnome named "code"

    • Much better name.

      It's very formally called Cross Fork Object Reference (CFOR). But commonly known as Git Forked! (Including the exclamation mark).

  • Best I could come up with after thinking for a moment is "AGHAST": "Astonishing GitHub Availability (of) Source Trees".

    But I'm still not entirely satisfied with the word choice.

Does any variant of this apply to DMCA’d repos in the repo network?

For example if the root repo is DMCA’d, or, if repo B forks repo A, then B adds some stuff that causes B to get DMCA’d. Can A still access B?

>Commit hashes can be brute forced through GitHub’s UI, particularly because the git protocol permits the use of short SHA-1 values when referencing a commit. A short SHA-1 value is the minimum number of characters required to avoid a collision with another commit hash, with an absolute minimum of 4. The keyspace of all 4 character SHA-1 values is 65,536 (16^4). Brute forcing all possible values can be achieved relatively easily.

>But what’s more interesting; GitHub exposes a public events API endpoint. You can also query for commit hashes in the events archive which is managed by a 3rd party, and saves all GitHub events for the past decade outside of GitHub, even after the repos get deleted.

Oof

ISTM there’s a straightforward mitigation or two available to GitHub:

1. If a URL would be in the “[t]his commit does not belong to any branch of this repository, and may belong to a fork outside of the repository” and that URL uses a shortened commit hash, return 404 instead. Assuming no information leakage via timing, this would make semi-brute-force probing via short hashes much harder.

GitHub is clearly already doing the hard work for this.

2. A commit that was never public should not become public unless it is referenced in a public repository.

This would require storing more state.

How is this more of a vulnerability than the existence of sites like archive.org is? Isn't it just a fact of the Internet that once you make something public, you can't fully take it back later?

  • The third case in the article shows private forks being leaked publicly when the upstream goes public.

    The other two cases are indeed not worse than third-party archival, but they're still socially concerning. When you ask your own host to delete something you uploaded, you don't expect them to ignore you just because someone could have already archived it maybe. Making it harder to find can still be valuable; not all archives stay available forever, if any.

    • > When you ask your own host to delete something you uploaded, you don't expect them to ignore you just because someone could have already archived it maybe.

      I've had a service say that deleting the information fully can take eight months.

The title makes it seem more severe than it is. This only applies to GH forks of public repositories (or repositories that become public). Forks mirror the upstream repo's visibility.

I don’t use GitHub for anything serious, rather my own Gitea. However:

> Any commits made to your private fork after you make the “upstream” repository public are not viewable.

Does that mean a private repo that has never been or will be public isn’t accessible? That scenario wasn’t mentioned.

  • My understanding is that you are correct. If the repo and all of its forks stay private then the only people that would be able to view them are people who have permissions to access those repos.

even better you can actually commit to other forks if they creates pull request to you.

(there is checkbox allowing that when you are opening PR that I bet almost noone noticed)

I reported that years ago and all they changed it that they extended documentation about this "feature"

my main issue was that you cannot easily revoke this access because target repo can always reopen PR and regain write access.

but they basically "stated works as intended"

The only valid one is the last (3rd) one:

Accessing commits on a private fork when it's upstream is made public

The other 2 are just common sense... push something to a public repo and it's public forever. Everyone knows once somethings on the internet it's already too late to make it secret again.

Key takeaways for me:

1) Never store secrets in any repo ever! As soon as you discover that its happened, rotate the key/credential/secret asap!!

2) Enterprises that rely on forking so that devs can colab are fucked! Protecting IP by way of private repos is now essentially broken on GH!

3) what the actual fuck github!!??

Truffle is practically famous for clickbait like this. They have a YouTube channel full of it. Their behavior in the security industry steered us far away from them as a vendor.

  • This is not clickbait.

    It's well-explained and fairly presents the facts and GH's position. Based on the reaction here, it's clear many people are not aware of these footguns. If anything, the article is a public service.

    • Based on the comments, many have known since 2018. GitHub has made multiple statements about it.

      It's been written about multiple times, and now truffle is reposting old content with a name like IDOR to try to invent a new vuln class that doesn't exist.

      The title of the post is misleading, a specific set of repos leak data under specific circumstances - not every repo. The first two sentences of the post immediately downscope the claim made by the title.

      Im guessing you didn't bother to check out thier YouTube.

      This post is the only thing the OP has ever posted in 8 months, probably because it's truffle themselves. I stand by my statement, it's clickbait.

      1 reply →

So does that mean that forked repos don't do garbage collection of unreferenced commits?

If I force push and orphan a commit, I expect that will get garbage collected and be gone forever.

Or if I commit a file I shouldn't have and rewrite my repo history and push up a whole new history, is the old history still hanging out forever?

If true, then it seems that there is no way to delete any commits at all from any repo that has any forks?

  • > If true, then it seems that there is no way to delete any commits at all from any repo that has any forks?

    I do not believe the presence of forks matters. Or rather, your version is the initial fork.

    My impression is that garbage collection is an expensive and disruptive option (to all forks) and so there's no button or API for it. Hence the recommendations to contact support if you accidentally commit an API KEY or the like (but really, you have already rotated that key, right?)

Doesn't this kind of make sense? We are not dealing with personal property. We are dealing with term licensed software.

Github is a software distributing network. Like the app store, or Steam. They grant you access to licensed content, which you self license, and then they facilitate access for you. Based on the honor system. But some things can just be assumed to be true for the sake of simplicity and liability.

For example, If I make a repo public and then take it private the hashes that were obtained while it was open are still open. If I make a repo that's closed and open it, the whole thing is open.

If you fork a public repo and make private commits on it to a software distributor like Github, that is probably just going to end in a violation of the license. In this scenario, Github is saving you from yourself.

I found some obscure instances where user expectation doesn't match reality on GitHub before, and nobody there cares.

If anyone's wondering: Organizations that require SAML are included in your organizations even when you don't have a SAML session when signing in elsewhere via OAuth. Unlike generalized per-organization app authorizations, where GitHub can actually hide organization membership. Only way to find out if a user has a SAML session is for the consuming app to request the membership with your token, and interpret 403 as "no SAML session". As far as I know only Tailscale implemented this. This really sucks for apps like SonarCloud where someone can now view work code from their so cleanly separated personal and professional use GitHub account.

On the positive side this takes care of all those companies forking open source software and not contributing back

To fork private, I always just make a new repo and push to it. Looks like that behaves correctly here.

  • Agreed. If anything, github should remove the option to change a repo from private to public or vice versa. Force creation of a new repo with the correct settings.

I actually think this is a good thing and should simply be made more clear. The reason is the following from the article:

> I submitted a P1 vulnerability to a major tech company showing they accidentally committed a private key ... They immediately deleted the repository,

That is a ridiculous response to a compromised key. The repository should not have been "deleted", the key should have been revoked.

Imagine if you lost a bag with 100 keys to your house. Upon realising you desperately try to search for the bag only to find it's been opened and the keys spread around. You comb through the grass and forests nearby collecting keys and hoping you find them all.

Or you just change the locks and forget about it.

If you upload something, anything, to a computer system you do not own you need to consider it no longer secret. It's as simple as that. Don't like it? Don't do it.

I detest things like delete buttons in messaging apps and, even worse, email recall in Outhouse-style email apps. They just give people a false sense of security. I've been accidentally sent someone's password several times on Teams. Yeah you deleted the message, but my memory is very good and, trust me, I still know your password.

If there's a security problem here it's in people believing you can delete stuff from someone else's system, or that that systems make it look like you can. The solution is the same though: education. Don't blame GitHub. Don't force them to "fix" this. That will only make it worse because there are still a million other places people will upload stuff and also won't actually delete stuff.

I reported a similar and even more damaging I my opinion (https://hackerone.com/reports/2240374) and they also dismissed as by design.

Turns out I found out you could even invite external collaborators into your fork and totally bypass enforced SSO.

Even if you block forking into your main repo, the existing forks remains active and still can pull from upstream.

It feels like if you need proper security, you have to go with enterprise

This behaviour is also important for ergonomic submodules. The .gitmodules file lists the upstream repo as the origin. So, if you're modifying an upstream project in a submodule and push changes to a fork, it's important that the SHA that git tracks is still reachable through the upstream link.

Ultimately I don't think it's feasible to break this behaviour and the most we can hope for is a big red warning when something counterintuitive happens.

I cannot access the commit https://github.com/trufflesecurity/trufflehog/commit/7bc0b shown in one of the pictures in the article (right before "Where do you get these hash values") despite this repo is even public.

What gives?

People should realize that once you upload something, it will be out there, forever. I assume this happens to everything.

Trusting some company will actually delete your stuff is kind of naive in my opinion.

The example of people forking and putting an API key in the repo, I would never let my people do this. Once you push, it will be "out there".

There is a reason that anyone who cares about forks being private forever, (even if you delete it) should never use or trust a third party. I never use Github. I run my own git server and everyone else should to in my opinion. Github has always been a huge security problem.

But that's just me...

I wonder if copyleft projects can use this to find license violations and force the altered code into the open.

A serious security issue indeed, if someone knows the hash.

How I manage this is that every time I want to open-source a previously private feature, I take the changeset diff and apply that to the files in the public repository. Same features, but plausibly different hash.

So... this is only an issue with forking, right? And, forking is not the same thing as branching... right? I'm just trying to make sure I understand this since I do branching all the time, but have never forked anything.

So the moment something is published on the Internet publicly, there's a chance it will be saved and you will not be able to get it deleted.

That, unfortunately, sounds like the result of publishing something on the Internet. Not GitHubs fault.

People are so preoccupied with putting the code on GitHub. It’s like it doesn’t exist before it’s on GitHub.

If you’re not gonna share it then it hardly matters. Use a backup drive.

Git is distributed. You don’t have to put your dotfiles on GitHub. Local is enough.

  • Your laptop breaks in a way that your disk cannot be recovered. Now what? How often are you backing up your disk? Probably much easier to type "git commit" and "git push"

Should GitHub be liable for any damages caused by this issue, like some think Crowdsec should be for what happened last week?

Morally seems even worse, Crowdsec did it by accident, GitHub knows about it for years now.

The few times I made private copy public I made a brand new git repo, copied the working copy over, and published that as public. I'd never include past private git history when making something public.

Great website design that loads fine without scripts but then runs something that requires features found only in newer browsers and then deletes the entire content when that fails. Why?

commits done to private repose being public (point 2&3) is always a non minor security vulnerability IMHO

it doesn't matter if it's behaving as intended or how there are forks

also point 1 implies that github likely doesn't properly GCes there git which could have all kinds of problematic implications beyond the point 1 wrt. purging accidental leaked secrets or PI....

all in all it just shows github might not take privacy security serious ... which is kinda hilarious given that private repo using customers tend to be the paying customers

  • You’re right that they don’t let commits get GC. They jump through hoops in order to keep commits that are not transitively referenced from being garbage collected. Just assume that every commit is kept around for “auditing”.

    One GitHub employee even contributed a configuration to Git which allows you to do the same thing: run a program or feed a file which tells the GC what nodes to not traverse.

Question: but for deleted you just mean commit to delete it? Because if you remove the commit from the repo, it should disappear.

>forks copyleft project >tries to make it private, violating license >fork made public

Sounds like a win for foss

I always acted as if there were no such thing as private data on github. Maybe even the internet as a whole.

A “delete” means it should be gone forever from the service it was removed from.

“Private” means it should only be available to specific involved parties only.

If you implement any other behavior to these concepts you are implementing anti patterns.

We need to be precise and consistent in the wording of the functions we are providing in order to ensure we easily can understand what is going on, without having to interpret documentation to be able to fully understand what is going on.

So if I read the article correctly if I never fork or otherwise contribute from my private repo I’m good?

This is why for private and business projects, we don't use GitHub, we use Amazon CodeCommit.

  • The article states that this “vulnerability” might exist in other scm systems as well

  • Because of literally this issue? I'm not sure if you're doing a generic "I don't like github" or know for a fact that CodeCommit doesn't have issues like this.

    This seems like a terrible security vector but I'm not sure migrating thousands of repos out of github vs. training engineers to keep public and private repos completely separated makes sense and you haven't explained why you use CodeCommit.

    Unless it is this reason, which like I said, seems a bit heavy handed, but I rarely move private repos to public.

    I kind of assumed this was a distributed Git problem, not Github, but I don't know.

This walks like a dark pattern and quacks like a dark pattern. People's entire livelihoods are at stake and they don't care. Most likely because plausible deniability and obscure TOS rights of how and when the code is used is more valuable to them than the reputation hit. It is hard to imagine this is very hard to fix.

  • > People's entire livelihoods are at stake

    No they aren't.

    • Sure they are. If somebody has a proprietary product that they happened to organize as a fork of an open source base at some point, it is exposed. The git organization aside, that is a very common business model.

      1 reply →

Wow, that's crazy. I tried a 6 digit hash and got a 404, then I tried another 6 digit hash and got "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

Insane

I learned about it years ago when I accidentally pushed secrets to the repo. When after rebasing and force pushing to the branch I was still able to access that commit, we decided to stop using GitHub.

  • Hopefully you have since learned to read the documentation of the tools you use, or at least enough of it to understand the basic data model you are working with. Rebasing won't even (immediately) remove the commits from your local repo. And force pushing isn't some magic operation either.

    Further, even if you had managed to delete the secrets from the repo you have to assume that others already copied them and rotat your keys anwyay.

    • Yes, the credentials were invalidated promptly, before trying to remove them from GitHub. That said, we were using different version control system and GitHub was new to us. This was many years ago.

all your private photos on gdrive have publically accessable urls too. most ppl dont know all their private photos are exposed to the world.

I won't be surprised if "right to be forgotten"/GDPR abusers will spam github and force them to act on it, eventually.

----

This is clearly documented and can be explained even to non-technical managers.

From my POV calling that vulnerability is trying to build a hype.

I think that having quote from here on visibility changing settings page would be even more clear: https://docs.github.com/en/pull-requests/collaborating-with-...

They have the yellow banner to detect when you likely access a hash like this. Why do they allow those commit hashes to be accessed through the short commit hashes?

Commit hashes are essentially capabilities, you should be able to access any data that you have a capability for. But allowing access via a 16bit prefix is just idiotic, and equivalent to accepting just the first two bytes of a 256bit cryptographic signature...

> 1) Fork the repo. 2) Hard-code an API key into an example file. 3) <Do Work> 4) Delete the fork.

... yeah if <Do Work> is push your keys to GitHub.

Come on, this is not surprising.

"Private repositories" were never private as I said before. [0]

[0] https://news.ycombinator.com/item?id=23057769

  • Your argument from before is just that the user is not in full control.

    Well, duh. That's not a reason to avoid every "private" feature in every product on the planet.

    A failure in the system is still surprising. I could equally say "all software has bugs, so it's not surprising if your self-hosted solution leaks data". But that would be too dismissive, as you are being.

  • >Come on, this is not surprising.

    Very cool that it is not surprising to you.

    But to others (some are even in this thread!) it is both new and surprising. They unfortunately missed your 4 year old comment, but at least they get to learn it now.

Data that you place with an entity that is a large organization with many commercial and government ties - must be assumed to be accessible to some of those parties.

And if that entity has a complex system of storage and retrieval of data by and for many users, that changes frequently, without public scrutiny - it should be assumed that data breaches are likely to occur.

So I don't see it as very problematic that GitHub's private repositories, or deleted repositories, are only kind-sorta-sometimes private and deleted.

And it's silly that the article refers to one creating an "internal version" of a repository - on GitHub....

Still, interesting to know about the network-of-repositories concept.

This isn't a bug IMO.

If you know the hash of some data, then you either already have the data yourself, or you learned the hash from someone who had the data.

If you already have the data, there is no vulnerability - since you cannot learn anything you don't already have.

If you got the hash from someone, you could likewise have gotten the data from them.

People do need to be aware that 'some random hex string' in fact is the irrevocable key to all the data behind that hash - but that's kinda inherent to gits design. Just like I don't tell everyone here on HN my login password - the password itself isn't sensitive, but both of us know it accesses other things that are.

If github itself was leaking the hash of deleted data, or my plaintext password, then that would be a vulnerability.

  • >If you know the hash of some data, then you either already have the data yourself, or you learned the hash from someone who had the data.

    From the article, you do not need to have the data nor learn the hash from someone who had the data.

    >Commit hashes can be brute forced through GitHub’s UI, particularly because the git protocol permits the use of short SHA-1 values when referencing a commit. A short SHA-1 value is the minimum number of characters required to avoid a collision with another commit hash, with an absolute minimum of 4. The keyspace of all 4 character SHA-1 values is 65,536

  • > If you know the hash of some data, then you either already have the data yourself, or you learned the hash from someone who had the data.

    Don’t think so - the article mentions you can use the short prefix on GitHub, so you have a search space of 65536.

  • > If you know the hash of some data, then you either already have the data yourself, or you learned the hash from someone who had the data.

    You need to read to the end of the article where they show the brute-force way of getting the hashes.

  • That's counterintuitive, though - often, the whole point of a hash is that it's one-way.