Comment by jgrahamc

8 years ago

We identified 3,438 unique domains. I'm not sure if those were all sent to Tavis because we were only sending him things that we wanted purged.

91 comments

jgrahamc

rdl 8 years ago

3438 domains which someone could have queried, but potentially data from any site which had "recently" passed through Cloudflare would be exposed in response, right? Purging those results helps with search engines, but a hypothetical malicious secret crawler would still potentially have any data from any site.

dsp1234 8 years ago
It doesn't have to be a secret crawler. Just one that wasn't contacted by cloudflare (I didn't see any non-US search providers mentioned).
- kevcampb 8 years ago
  
  In other words, Baidu are currently sitting on a treasure trove of keys and passwords.
  
  12 replies →
- foodstances 8 years ago
  
  I wonder if archive.org or archive.is have anything cached...
  
  5 replies →
taviso 8 years ago
correct
- Trundle 8 years ago
  
  Have you asked them for an eta on your shirt?
  
  5 replies →
- rdl 8 years ago
  
  fuck :(
  
  27 replies →

tptacek 8 years ago

What anomalies would be apparent in your logs if someone malicious had discovered this flaw and used it to generate a large corpus of leaked HTTP content?

aidos 8 years ago
That's also what I'm interested in. There's a lot of talk about the sites that had the features enabled that allowed the data to escape, but it's the sites that were co-existing with those that were in danger.
In terms of the caching, knowing the broken sites tells you where to look in the caches after the fact, but do you have any idea of who's data was leaked? Presumably 2 consecutive requests to the same malformed page could/would leak different data.
- vmarsy 8 years ago
  
  > Presumably 2 consecutive requests to the same malformed page could/would leak different data.
  Wouldn't the second request be served from the CDN cache? Since for Cloudfare that particular page is a valid cached page, it would send you that same page on the second request.
  
  3 replies →
beachstartup 8 years ago

it seems to me you'd have to know at a minimum:
1. every tag pattern that triggers the bug(s)
2. which broken pages with that pattern were requested at an abnormally high frequency or had an unusually short TTL (or some other useful heuristic)
3. on which servers, and at what time, in order to tell
4. who's data lived on the same servers at the same time as those broken pages
to even begin to estimate the scope of the leak. and that doesn't even help you find who planted the bad seeds.

tptacek 8 years ago

Here's a question your blog post doesn't answer but should, right now:

Exactly which search engines and cache providers did you work with to scrub leaked data?

dsp1234 8 years ago

Also, have you worked with any search engine to notify affected customers.
ex: Right now there is in an easily found google cached page with OAuth tokens for very popular fitness wearable's android API endpoints

wilde 8 years ago

Are you guys planning to release the list so we can all change our passwords on affected services? Or are you planning on letting those services handle the communication?

pepve 8 years ago
That list contains domains where the bug was triggered. The information exposed through the bug though can be from any domain that uses Cloudflare.
So: all services that have one or more domains served through Cloudflare may be affected.
The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets). But the data still was saved all over the world in web caches. So the bad guys are now probably after those. Though I don't know how much 'useful' data they would be able to extract, and what the risks for an average internet user are.
- ComputerGuru 8 years ago
  
  > The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets).
  This is literally as bad as it gets, anyone trying to palliate the solution has something to sell you. You'd have to be an idiot to think that $organization (public, private, or shadow) doesn't have automated systems to check for something as stupid simple as this by querying resources at random intervals and searching for artifacts.
  Someone found it. Probably more than one someone. Denial won't help.
- wilde 8 years ago
  
  Ah, gotcha. Thanks for explaining!
- jimmaswell 8 years ago
  
  Myself and 4 other people I know all happened to get their reddit accounts temporarily locked due to a "possible compromise" in the past week or so, which has never happened to any of us before. Anyone else?
  
  6 replies →
nikisweeting 8 years ago
I've compiled a list of 7,385,121 domains that use Cloudflare here: https://github.com/pirate/sites-using-cloudflare
- dbmnt 8 years ago
  
  This list is misguided. It's just a dump of sites using Cloudflare's DNS, a hugely popular and (mostly) free service. The vulnerability only affected customers using Cloudflare's paid SSL proxy (CDN) service. The latter is a much smaller subset. Even then, only a subset of the SSL proxy users, those with certain options enabled that caused traffic to go through a vulnerable parser, were really impacted. I'm not sure a list as broad as this is helpful.
  
  5 replies →
- nikisweeting 8 years ago
  
  (whoops forgot to remove dupes, it's only 4,287,625) https://github.com/pirate/sites-using-cloudflare/raw/master/...
tlrobinson 8 years ago
If I'm understanding correctly, that list would include not only the 3,438 domains with content that triggered the bug, but every Cloudflare customer between 2016-09-22 and 2017-02-18.
- Xorlev 8 years ago
  
  Can we trust it was only those domains?
  
  1 reply →
- dbmnt 8 years ago
  
  No. Only Cloudflare customers using a subset of features of the SSL proxy service are impacted.
  Cloudflare has a lot of customers who only use the free DNS service, for example.
  
  2 replies →

jameshart 8 years ago

What I find remarkable is that the owners of those sites weren't ever aware of this issue. If customers were receiving random chunks of raw nginx memory embedded in pages on my site, I'd probably have heard about it from someone sooner, surely?

I guess there is a long tail of pages on the internet whose primary purpose is to be crawled by google and serve as search landing pages - but again, if I had a bug in the HTML in one of my SEO pages that caused googlebot to see it as full of nonsense, I'd see that in my analytics because a page full of uninitialized nginx memory is not going to be an effective pagerank booster.