Comment by espadrine

8 years ago

It is far from over, too! Google Cache still has loads of sensitive information, a link away!

Look at this, click on the downward arrow, "Cached": https://www.google.com/search?q="CF-Host-Origin-IP:"+"author...

(And then, in Google Cache, "view source", search for "authorization".)

(Various combinations of HTTP headers to search for yield more results.)

> The infosec team worked to identify URIs in search engine caches that had leaked memory and get them purged. With the help of Google, Yahoo, Bing and others, we found 770 unique URIs that had been cached and which contained leaked memory. Those 770 unique URIs covered 161 unique domains. The leaked memory has been purged with the help of the search engines.

So I tried it too, and there's still data cached there.

Am I misunderstanding something - that above statement must be wrong, surely?

They can't have found everything even in the big search engines if it's still showing up in Google's cache, let alone the infinity other caches around the place.

EDIT: If the cloudflare team sees I see leaked credentials for these domains:

android-cdn-api.fitbit.com

iphone-cdn-client.fitbit.com

api-v2launch.trakt.tv

  • I'm also seeing a ton from cn-dc1.uber.com with oauth, cookies and even geolocation info. https://webcache.googleusercontent.com/search?q=cache:VlVylT...

  • Could someone enlighten me on why malloc and free don't automatically zero memory by default?

    Someone pointed me to MALLOC_PERTURB_ and I've just run a few test programs with it set - including a stage1 GCC compile, which granted may not be the best test - and it really doesn't dent performance by much. (edit: noticeably, at all, in fact)

    People who prefer extreme performance over prudent security should be the ones forced to mess about with extra settings, anyway.

    • Some old IBM environments initialized fresh allocations to 0xDEADBEEF, which had the advantage that the result you got from using such memory would (usually) be obviously incorrect. The fact that it was done decades ago is pretty good evidence that it's not about the actual initialization cost: these things cost a lot more back then.

      What changed is the paged memory model: modern systems don't actually tie an address to a page of physical RAM until the first time you try to use it (or something else on that page). Initializing the memory on malloc() would "waste" memory in some cases, where the allocation spans multiple pages and you don't end up using the whole thing. Some software assumes this, and would use quite a bit of extra RAM if malloc() automatically wiped memory. It would also tend to chew through your CPU cache, which mattered less in the past because any nontrivial operation already did that.

      I personally don't think this is a good enough reason, but it is a little more than just a minor performance issue.

      That all being said, while it would likely have helped slightly in this case, it would not solve the problem: active allocations would still be revealed.

      14 replies →

    • Zeroing on malloc and/or free would not have prevented this type of error, since the information disclosure was due to an overflow into an adjacent allocated buffer.

      However, zeroing on free is generally a useful defense-in-depth measure because can minimize the risk of some types of information disclosure vulnerabilities. If you use grsecurity, this feature is provided by grsecurity's PAX_MEMORY_SANITIZE [0].

      [0]: https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity...

    • Zeroing on alloc/free probably wouldn't have helped much with this bug. Data in live allocations would still be leaked.

    • > Could someone enlighten me on why malloc and free don't automatically zero memory by default?

      The computational cost of doing so, I suspect.

      5 replies →

    • Are these results hardware independent? Maybe it makes a difference on older machines, or different architectures.

    • I imagine clearing memory on free is more relevant than MALLOC_PERTURB_?

  • > that above statement must be wrong, surely?

    Either they believe it's right, which means they're not competent enough to really assess the scope of the leak; or they don't believe it, but they went "fuck it, that's the best we can do".

    In either case, it doesn't really inspire trust in their service.

    • you missed one possibility: that they're deliberately attempting to downplay the severity to make themselves look less incompetent

  • jgrahamc: can you list which public caches you worked with to attempt to address this? It does not inspire confidence when even google is still showing obvious results

    • Google, Microsoft Bing, Yahoo, DDG, Baidu, Yandex, and more. The caches other than Google were quick to clear and we've not been able to find active data on them any longer. We have a team that is continuing to search these and other potential caches online and our support team has been briefed to forward any reports immediately to this team.

      I agree it's troubling that Google is taking so long. We were working with them to coordinate disclosure after their caches were cleared. While I am thankful to the Project Zero team for their informing us of the issue quickly, I'm troubled that they went ahead with disclosure before Google crawl team could complete the refresh of their own cache. We have continued to escalate this within Google to get the crawl team to prioritize the clearing of their caches as that is the highest priority remaining remediation step.

      91 replies →

https://webcache.googleusercontent.com/search?q=cache:lw4K9G...

    Internal Upstream Server Certificate
    ...
    /C=US/ST=California/L=San Francisco/O=Cloudflare Inc./OU=Cloudflare Services - nginx-cache/CN=Internal Upstream Server Certificate

That really doesn't look good.

Lol, Google just purged that search.

EDIT: but there's still plenty of fish: http://webcache.googleusercontent.com/search?q=cache:lw4K9G2...

This will take weeks to clean, and that's just for Google.

EDIT2: found other oauth tokens, lots of fitbit calls... And this just by searching for typical CF internal headers on Google and Bing. There is no way to know what else is out there. What a mess.

  • Ouch, you really see everything :

    > authorization: OAuth oauth_consumer_key ...

    what a shit show. I'm sorry but at that point there must be consequences for incompetence. Some might argue "But nobody can't do anything" ...

    I'm sorry, CF has the money to to ditch C entirely and rewrite everything from the ground up with a safer language, I don't care what it is, Go,Rust whatever.

    At that point people using C directly are playing with fire. C isn't a language for highly distributed applications, it will only distribute memory leaks ... With all the wealth there is in the whole Silicon Valley, trillions of dollars, there is absolutely 0 effort to come up with an acceptable solution? all these startups can't come together and say: "Ok,we're going to design or choose a real safe language and stick to that"? where does all that money goes then? Because this bug is going to cost A LOT OF MONEY to A LOT OF PEOPLE.

    • These guys were probably saved by using OAuth - there is a consumer secret (which the "_key" is just an identifier for) and an access token secret, both of which are not sent over the wire. Just a signature based on them. (The timestamp and nonce prevent replay attacks.)

      OAuth2 "simplified" things and just sends the secret over the wire, trusting SSL to keep things safe.

      1 reply →

  • Good. They're trying to clean up all the private data leaked everywhere. I tempted to say "why couldn't they figure out this google dork themselves" but they've probably been slammed for the past 7 days cleaning up a bunch of stuff anyway.

  • > This will take weeks to clean, and that's just for Google.

    Couldn't Google just purge all cached documents which match any Cloudflare header? This will probably purge a lot of false positives, but it's just cached data, so would that loss really matter? My guess is that this approach should not take more than a few hours on Google's infrastructure.

    Of course, this leaves the problem of all the other non-Google caches out there.

  • OAuth1 doesn't send the secrets with the requests, just a key to identify the secret and a signature made with the secret.

    OAuth2 does send the secret, typically in an "Authorization: Bearer ..." header.

    The uber stuff that somebody else linked to looks like a home-grown auth scheme and it appears that "x-uber-token" is a secret, but hard to know for sure.

  • So while people are having fun here with search queries, how many scripts are already up and running in the wild, scraping every caching service they can think of in creative ways for useful data...

    This is an ongoing disaster, wasn't this disclosed too soon?

The "well-known chat service" mentioned by Tavis appears to be Discord, for the record.

edit: Uber also seems to be affected.

>It is a snapshot of the page as it appeared on Feb 21, 2017 20:20:45 GMT

So the issue wasn't fully fixed on Feb 19, or Google's cache date isn't accurate?

It seems like the reasonable thing for Google to do is to clear their entire cache. The whole thing. This is the one thing that they could do to be certain that they aren't caching any of this.

  • What about Bing, Baidu, Yandex, The Internet Archive, and Common Crawl? What about caches that are surely maintained by the NSA, ФСБ, and 3PLA?

    • Of course. Google dumping their cache puts only a small dent into the problem, but I feel that it's their responsibility to the innocent site operators caught in the middle of this.

      2 replies →

  • CF should be thankful Google is doing any of this, clearing their entire cache would cost Google $ to index web from scratch.

  • That might be a bit too extreme. But they should do something quickly to try to find all of these.

    • I would say cloudflare should hire them to try to find them. It's really not on google IMO (unless caching has some implications regarding storing sensitive data).

Wow, I just tried this, the first result with a google cache copy has a bunch of the kind of data described. Although there was only one result with a cache.

  • PII, OAuth data, etc.

    • I've so far seen an oAuth key for fitbit (via their android app) and api keys for trakt (though apparently that service doesn't use them?)

      I don't know, this just seems catastrophic.

The first couple I looked at were requests to Uber and Fitbit...

  • One of my Uber rides two weeks ago went completely nuts. Both my and my drivers app screwed up at the same time and I was never picked up and then seconds later the app claimed I reached my destination.

    You have to wonder whether something like this is implicated.