Comment by sparkling

8 years ago

I don't get it. How is this info leaked? From the blog posts, it seems that "only" the HTTP Headers are being leaked and somehow being crawled by Google? But since when does Google store HTTP request info? Can someone explain?

Headers (among other sensitive stuff) were being leaked inside document bodies.

  • So just to clarify: some bug makes Cloudflare leak the HTTP Headers into the HTML being served and those HTML pages containing sensitive Info got cached by Google (and others)?

    • Yes. Think of it this way.

      You have a function that strips all colons from your input. For some reason - in certain cases - your code misbehaves and when you are replacing the colons with an empty character you accidentally replace that colon with other data you have in the memory. So now all the colons in your input have been replaced with data that you shouldn't have touched. So now whoever sent you an input, gets back that input + more data they shouldn't be able to see.

      And Google in this case caches those output strings.

      4 replies →

    • Yeah.

      "We leaked information from Customer A to Customer B by accident" is the first order problem.

      But the existence of web caches means that all that private information of customer A is potentially fucking everywhere now.

      How do you even clean this up? How do you even start?

    • They leak uninitialized memory contents into the HTML being served; that memory could (and did) contain data from any other traffic that passed through their hands.

      So a request sent to Cloudflare customer A's site could return data from Cloudflare customer B, including data that B thought was only being served via https to authenticated users of B.

    • Not just headers, basically random memory dumps that could contain anything that Cloudflare saw (which is almost everything). Passwords, certificates, you name it.

    • Essentially. Any headers from any site routing through cloudflare could get injected into the body of a second site's page if that second site was using the obfuscation feature. Those "mis-stuffed" pages could (and were) then cached by, among other things, crawlers like those operated Google and Bing.

      Apparently 7xx sites had this enabled, but that affected 4000ish other sites that happened to be on the same infrastructure.

    • Near as I can tell, the HTTP Headers from one site are being included in HTML of other sites...

Cloudflare handles SSL for a lot of sites. It decrypts everything and passes it along.

For certain other sites, with malformed html, there is a bug that caused it to grab random data (headers and body) from memory and include it in the body of the response HTML. (Some html rewriting product that cloudflare offered was broken and it ran on the same servers.)

This stuff got sent to peoples browsers and also to web indexers like Google or Bing.

Google lets you search for stuff and will also show you the original page that it scraped, making it easy to find this data.

Edit: Also you may be seeing more headers in examples because headers are easier to search for.

HTTP Headers were being including the http response bodies of other random websites. Those websites were being crawled and cached.

requesting a page with a specific combination of broken tags, when done through cloudflare, will cause neighboring memory to be dumped into the response. op suspects this is due to a bounds checking bug on a read or copy. one can imagine this can be potentially kilobytes of data in one go.

since anyone can put a broken page behind cloudflare, all you need to do is request your own broken page through cloudflare, and start collecting the random "secure" data that comes back.