← Back to context

Comment by jgrahamc

8 years ago

You really want to see Cloudflare spend more time discussing how they've quantified the leak here.

What would you like to see? The SAFE_CHAR logging allowed us to get data on the rate which is how I got the % of requests figure.

How many different sites? Your team sent a list to Tavis's team. How many entries were on the list?

  • We identified 3,438 unique domains. I'm not sure if those were all sent to Tavis because we were only sending him things that we wanted purged.

    • 3438 domains which someone could have queried, but potentially data from any site which had "recently" passed through Cloudflare would be exposed in response, right? Purging those results helps with search engines, but a hypothetical malicious secret crawler would still potentially have any data from any site.

      55 replies →

    • What anomalies would be apparent in your logs if someone malicious had discovered this flaw and used it to generate a large corpus of leaked HTTP content?

      6 replies →

    • Here's a question your blog post doesn't answer but should, right now:

      Exactly which search engines and cache providers did you work with to scrub leaked data?

      1 reply →

    • Are you guys planning to release the list so we can all change our passwords on affected services? Or are you planning on letting those services handle the communication?

      24 replies →

    • What I find remarkable is that the owners of those sites weren't ever aware of this issue. If customers were receiving random chunks of raw nginx memory embedded in pages on my site, I'd probably have heard about it from someone sooner, surely?

      I guess there is a long tail of pages on the internet whose primary purpose is to be crawled by google and serve as search landing pages - but again, if I had a bug in the HTML in one of my SEO pages that caused googlebot to see it as full of nonsense, I'd see that in my analytics because a page full of uninitialized nginx memory is not going to be an effective pagerank booster.

Perhaps as a follow up to this bug, you can write a temporary rule to log the domain of any http responses with malformed HTML that would have triggered a memory leak. That way you can patch the bug immediately, and observe future traffic to find the domains that were most likely affected by the bug when it was running.

Or is the problem that one domain can trigger the memory leak, and another (unpredictable) domain is the "victim" that has its data dumped from memory?

  • I believe that's the real issue. Any data from any couldflare site may have been leaked. Those domains allow Google etc to know which pages in their cache may contain leaked info, unfortunately the info itself could be from any request that's travelled through cloudflare's servers.