Comment by tptacek

8 years ago

How many different sites? Your team sent a list to Tavis's team. How many entries were on the list?

92 comments

tptacek

We identified 3,438 unique domains. I'm not sure if those were all sent to Tavis because we were only sending him things that we wanted purged.

rdl 8 years ago
3438 domains which someone could have queried, but potentially data from any site which had "recently" passed through Cloudflare would be exposed in response, right? Purging those results helps with search engines, but a hypothetical malicious secret crawler would still potentially have any data from any site.
- dsp1234 8 years ago
  
  It doesn't have to be a secret crawler. Just one that wasn't contacted by cloudflare (I didn't see any non-US search providers mentioned).
  
  19 replies →
- taviso 8 years ago
  
  correct
  
  34 replies →
tptacek 8 years ago
What anomalies would be apparent in your logs if someone malicious had discovered this flaw and used it to generate a large corpus of leaked HTTP content?
- aidos 8 years ago
  
  That's also what I'm interested in. There's a lot of talk about the sites that had the features enabled that allowed the data to escape, but it's the sites that were co-existing with those that were in danger.
  In terms of the caching, knowing the broken sites tells you where to look in the caches after the fact, but do you have any idea of who's data was leaked? Presumably 2 consecutive requests to the same malformed page could/would leak different data.
  
  4 replies →
- beachstartup 8 years ago
  
  it seems to me you'd have to know at a minimum:
  1. every tag pattern that triggers the bug(s)
  2. which broken pages with that pattern were requested at an abnormally high frequency or had an unusually short TTL (or some other useful heuristic)
  3. on which servers, and at what time, in order to tell
  4. who's data lived on the same servers at the same time as those broken pages
  to even begin to estimate the scope of the leak. and that doesn't even help you find who planted the bad seeds.
tptacek 8 years ago
Here's a question your blog post doesn't answer but should, right now:
Exactly which search engines and cache providers did you work with to scrub leaked data?
- dsp1234 8 years ago
  
  Also, have you worked with any search engine to notify affected customers.
  ex: Right now there is in an easily found google cached page with OAuth tokens for very popular fitness wearable's android API endpoints
wilde 8 years ago
Are you guys planning to release the list so we can all change our passwords on affected services? Or are you planning on letting those services handle the communication?
- pepve 8 years ago
  
  That list contains domains where the bug was triggered. The information exposed through the bug though can be from any domain that uses Cloudflare.
  So: all services that have one or more domains served through Cloudflare may be affected.
  The consensus seem to be that no one discovered this before now, and no bad guys have been scraping this leak for valuable data (passwords, OAuth tokens, PII, other secrets). But the data still was saved all over the world in web caches. So the bad guys are now probably after those. Though I don't know how much 'useful' data they would be able to extract, and what the risks for an average internet user are.
  
  9 replies →
- nikisweeting 8 years ago
  
  I've compiled a list of 7,385,121 domains that use Cloudflare here: https://github.com/pirate/sites-using-cloudflare
  
  7 replies →
- tlrobinson 8 years ago
  
  If I'm understanding correctly, that list would include not only the 3,438 domains with content that triggered the bug, but every Cloudflare customer between 2016-09-22 and 2017-02-18.
  
  5 replies →
jameshart 8 years ago

What I find remarkable is that the owners of those sites weren't ever aware of this issue. If customers were receiving random chunks of raw nginx memory embedded in pages on my site, I'd probably have heard about it from someone sooner, surely?
I guess there is a long tail of pages on the internet whose primary purpose is to be crawled by google and serve as search landing pages - but again, if I had a bug in the HTML in one of my SEO pages that caused googlebot to see it as full of nonsense, I'd see that in my analytics because a page full of uninitialized nginx memory is not going to be an effective pagerank booster.