← Back to context

Comment by jgrahamc

8 years ago

We identified 3,438 unique domains. I'm not sure if those were all sent to Tavis because we were only sending him things that we wanted purged.

3438 domains which someone could have queried, but potentially data from any site which had "recently" passed through Cloudflare would be exposed in response, right? Purging those results helps with search engines, but a hypothetical malicious secret crawler would still potentially have any data from any site.

What anomalies would be apparent in your logs if someone malicious had discovered this flaw and used it to generate a large corpus of leaked HTTP content?

  • That's also what I'm interested in. There's a lot of talk about the sites that had the features enabled that allowed the data to escape, but it's the sites that were co-existing with those that were in danger.

    In terms of the caching, knowing the broken sites tells you where to look in the caches after the fact, but do you have any idea of who's data was leaked? Presumably 2 consecutive requests to the same malformed page could/would leak different data.

    • > Presumably 2 consecutive requests to the same malformed page could/would leak different data.

      Wouldn't the second request be served from the CDN cache? Since for Cloudfare that particular page is a valid cached page, it would send you that same page on the second request.

      3 replies →

  • it seems to me you'd have to know at a minimum:

    1. every tag pattern that triggers the bug(s)

    2. which broken pages with that pattern were requested at an abnormally high frequency or had an unusually short TTL (or some other useful heuristic)

    3. on which servers, and at what time, in order to tell

    4. who's data lived on the same servers at the same time as those broken pages

    to even begin to estimate the scope of the leak. and that doesn't even help you find who planted the bad seeds.

Here's a question your blog post doesn't answer but should, right now:

Exactly which search engines and cache providers did you work with to scrub leaked data?

  • Also, have you worked with any search engine to notify affected customers.

    ex: Right now there is in an easily found google cached page with OAuth tokens for very popular fitness wearable's android API endpoints

Are you guys planning to release the list so we can all change our passwords on affected services? Or are you planning on letting those services handle the communication?

  • That list contains domains where the bug was triggered. The information exposed through the bug though can be from any domain that uses Cloudflare.

    So: all services that have one or more domains served through Cloudflare may be affected.

    The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets). But the data still was saved all over the world in web caches. So the bad guys are now probably after those. Though I don't know how much 'useful' data they would be able to extract, and what the risks for an average internet user are.

    • > The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets).

      This is literally as bad as it gets, anyone trying to palliate the solution has something to sell you. You'd have to be an idiot to think that $organization (public, private, or shadow) doesn't have automated systems to check for something as stupid simple as this by querying resources at random intervals and searching for artifacts.

      Someone found it. Probably more than one someone. Denial won't help.

    • Myself and 4 other people I know all happened to get their reddit accounts temporarily locked due to a "possible compromise" in the past week or so, which has never happened to any of us before. Anyone else?

      6 replies →

  • I've compiled a list of 7,385,121 domains that use Cloudflare here: https://github.com/pirate/sites-using-cloudflare

    • This list is misguided. It's just a dump of sites using Cloudflare's DNS, a hugely popular and (mostly) free service. The vulnerability only affected customers using Cloudflare's paid SSL proxy (CDN) service. The latter is a much smaller subset. Even then, only a subset of the SSL proxy users, those with certain options enabled that caused traffic to go through a vulnerable parser, were really impacted. I'm not sure a list as broad as this is helpful.

      5 replies →

  • If I'm understanding correctly, that list would include not only the 3,438 domains with content that triggered the bug, but every Cloudflare customer between 2016-09-22 and 2017-02-18.

What I find remarkable is that the owners of those sites weren't ever aware of this issue. If customers were receiving random chunks of raw nginx memory embedded in pages on my site, I'd probably have heard about it from someone sooner, surely?

I guess there is a long tail of pages on the internet whose primary purpose is to be crawled by google and serve as search landing pages - but again, if I had a bug in the HTML in one of my SEO pages that caused googlebot to see it as full of nonsense, I'd see that in my analytics because a page full of uninitialized nginx memory is not going to be an effective pagerank booster.