← Back to context

Comment by kalmi10

8 years ago

Headers (among other sensitive stuff) were being leaked inside document bodies.

So just to clarify: some bug makes Cloudflare leak the HTTP Headers into the HTML being served and those HTML pages containing sensitive Info got cached by Google (and others)?

  • Yes. Think of it this way.

    You have a function that strips all colons from your input. For some reason - in certain cases - your code misbehaves and when you are replacing the colons with an empty character you accidentally replace that colon with other data you have in the memory. So now all the colons in your input have been replaced with data that you shouldn't have touched. So now whoever sent you an input, gets back that input + more data they shouldn't be able to see.

    And Google in this case caches those output strings.

    • @homero (since I can't nest a reply any further), it's not the contents of the crawler's request that gets randomly injected into the page that the crawler requests, but rather the contents of other requests to the same Cloudflare server.

      Imagine I'm having a chat on some website X, which uses Cloudflare. Cloudflare acts as a man in the middle, meaning my request, and the response, likely pass through its memory at some point to allow me to communicate with X.

      Later, a Google bot comes along and requests a page from site Y. Because of this bug, random bits of memory that were left around on the Cloudflare server get inserted into the response to the bot's request. Those bits of memory could be from anything that's gone through that server in the past, including my conversations on website X. The bot then assumes that the content that Cloudflare spits out for website Y is an accurate representation of website Y's contents, and it caches those contents. In this way, my data from website X ends up in Google's cached version of website Y.

  • Yeah.

    "We leaked information from Customer A to Customer B by accident" is the first order problem.

    But the existence of web caches means that all that private information of customer A is potentially fucking everywhere now.

    How do you even clean this up? How do you even start?

  • They leak uninitialized memory contents into the HTML being served; that memory could (and did) contain data from any other traffic that passed through their hands.

    So a request sent to Cloudflare customer A's site could return data from Cloudflare customer B, including data that B thought was only being served via https to authenticated users of B.

  • Not just headers, basically random memory dumps that could contain anything that Cloudflare saw (which is almost everything). Passwords, certificates, you name it.

  • Essentially. Any headers from any site routing through cloudflare could get injected into the body of a second site's page if that second site was using the obfuscation feature. Those "mis-stuffed" pages could (and were) then cached by, among other things, crawlers like those operated Google and Bing.

    Apparently 7xx sites had this enabled, but that affected 4000ish other sites that happened to be on the same infrastructure.

  • Near as I can tell, the HTTP Headers from one site are being included in HTML of other sites...