Comment by PuffinBlue

8 years ago

> The infosec team worked to identify URIs in search engine caches that had leaked memory and get them purged. With the help of Google, Yahoo, Bing and others, we found 770 unique URIs that had been cached and which contained leaked memory. Those 770 unique URIs covered 161 unique domains. The leaked memory has been purged with the help of the search engines.

So I tried it too, and there's still data cached there.

Am I misunderstanding something - that above statement must be wrong, surely?

They can't have found everything even in the big search engines if it's still showing up in Google's cache, let alone the infinity other caches around the place.

EDIT: If the cloudflare team sees I see leaked credentials for these domains:

android-cdn-api.fitbit.com

iphone-cdn-client.fitbit.com

api-v2launch.trakt.tv

I'm also seeing a ton from cn-dc1.uber.com with oauth, cookies and even geolocation info. https://webcache.googleusercontent.com/search?q=cache:VlVylT...

  • That's terrifying.

    Thanks to Uber now requiring location services on Always instead of just when hailing a car, my and others' personal location history even outside of Uber usage could have been compromised. Sweet.

    • To be fair, you were kind of a fool if you actually let Uber have your location at all times. As soon as they announced that I blocked Uber from my location. I only allow it when I take an Uber (which is almost never now).

      3 replies →

Could someone enlighten me on why malloc and free don't automatically zero memory by default?

Someone pointed me to MALLOC_PERTURB_ and I've just run a few test programs with it set - including a stage1 GCC compile, which granted may not be the best test - and it really doesn't dent performance by much. (edit: noticeably, at all, in fact)

People who prefer extreme performance over prudent security should be the ones forced to mess about with extra settings, anyway.

  • Some old IBM environments initialized fresh allocations to 0xDEADBEEF, which had the advantage that the result you got from using such memory would (usually) be obviously incorrect. The fact that it was done decades ago is pretty good evidence that it's not about the actual initialization cost: these things cost a lot more back then.

    What changed is the paged memory model: modern systems don't actually tie an address to a page of physical RAM until the first time you try to use it (or something else on that page). Initializing the memory on malloc() would "waste" memory in some cases, where the allocation spans multiple pages and you don't end up using the whole thing. Some software assumes this, and would use quite a bit of extra RAM if malloc() automatically wiped memory. It would also tend to chew through your CPU cache, which mattered less in the past because any nontrivial operation already did that.

    I personally don't think this is a good enough reason, but it is a little more than just a minor performance issue.

    That all being said, while it would likely have helped slightly in this case, it would not solve the problem: active allocations would still be revealed.

    • > Some old IBM environments initialized fresh allocations to 0xDEADBEEF, which had the advantage that the result you got from using such memory would (usually) be obviously incorrect.

      On BSDs, malloc.conf can still be configured to do that: on OpenBSD, junking (fills allocations with 0xdb and deallocations with 0xdf) is enabled by default on small allocations, "J" will enable it for all allocations. On FreeBSD, "J" will initialise all allocations with 0xa5 and deallocations with 0x5a.

    • > What changed is the paged memory model: modern systems don't actually tie an address to a page of physical RAM until the first time you try to use it (or something else on that page). Initializing the memory on malloc() would "waste" memory in some cases, where the allocation spans multiple pages and you don't end up using the whole thing. Some software assumes this, and would use quite a bit of extra RAM if malloc() automatically wiped memory. It would also tend to chew through your CPU cache, which mattered less in the past because any nontrivial operation already did that.

      Maybe an alternative approach is to simply mark the pages to be lazily zeroed out when attached, in the Page Table Entries of the MMU. They wouldn't be zeroed out at the time of the call malloc(), but only when they are attached to a physical memory location (the first time you use it).

      3 replies →

    • It doesn't need to affect your CPU cache, because x64 processors have non-temporal writes (streaming stores) that bypass the cache.

      The stuff about eagerly allocating pages is spot on though.

      There is calloc which allocates and zeroes memory, but people don't use it as often as they should.

    • Parsers don't usually need to hold onto what they're parsing for a very long time, so unless they were running this parallel on a machine with 4k cores, I'd imagine it would be much more likely that a buffer overrun hits the middle of an already-freed allocation rather than going into an active one.

      In terms of "wasting" memory, perhaps the kernel could detect that you are writing 0s to a COW 0 page and still not actually tie the page to physical RAM. (If you're overwriting non-0 data, well it's already in a physical page.)

      I don't quite follow the details of the CPU cache issue and why that is more-than-minor.

      I do think in this day and age we should be re-visiting this question seriously in our C standard libraries. If the performance issues are actually major problems for specific systems, the old behaviour could be kept, but after benchmarking to show that it really is a performance problem.

      5 replies →

    • An invariant you get from most kernels is that all new memory pages are zeroed when mapped into processes (normally through mmap or sbrk), so you only have the paging problem when initializing with a value other than zero.

  • Zeroing on malloc and/or free would not have prevented this type of error, since the information disclosure was due to an overflow into an adjacent allocated buffer.

    However, zeroing on free is generally a useful defense-in-depth measure because can minimize the risk of some types of information disclosure vulnerabilities. If you use grsecurity, this feature is provided by grsecurity's PAX_MEMORY_SANITIZE [0].

    [0]: https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity...

  • Zeroing on alloc/free probably wouldn't have helped much with this bug. Data in live allocations would still be leaked.

  • Are these results hardware independent? Maybe it makes a difference on older machines, or different architectures.

  • I imagine clearing memory on free is more relevant than MALLOC_PERTURB_?

> that above statement must be wrong, surely?

Either they believe it's right, which means they're not competent enough to really assess the scope of the leak; or they don't believe it, but they went "fuck it, that's the best we can do".

In either case, it doesn't really inspire trust in their service.

  • you missed one possibility: that they're deliberately attempting to downplay the severity to make themselves look less incompetent

jgrahamc: can you list which public caches you worked with to attempt to address this? It does not inspire confidence when even google is still showing obvious results

  • Google, Microsoft Bing, Yahoo, DDG, Baidu, Yandex, and more. The caches other than Google were quick to clear and we've not been able to find active data on them any longer. We have a team that is continuing to search these and other potential caches online and our support team has been briefed to forward any reports immediately to this team.

    I agree it's troubling that Google is taking so long. We were working with them to coordinate disclosure after their caches were cleared. While I am thankful to the Project Zero team for their informing us of the issue quickly, I'm troubled that they went ahead with disclosure before Google crawl team could complete the refresh of their own cache. We have continued to escalate this within Google to get the crawl team to prioritize the clearing of their caches as that is the highest priority remaining remediation step.

    • I find it troubling that the CEO of Cloudflare would attempt to deflect their culpability for a bug this serious onto Google for not cleaning up Cloudflare's mess fast enough.

      Don't use CF, and after seeing behavior like this, don't think I will.

      3 replies →

    • This comment greatly lowers my respect for Cloudflare.

      Bugs happen to us all; how you deal with this is what counts, and wilful, blatant lying in a transparent attempt to deflect blame from where it belongs (Cloudflare) onto the team that saved your bacon?

      I've recommended Cloudflare in the past, and I was planning, with some reservations, to continue to do so even after disclosure of this issue. But seeing this comment? I don't see how I can continue.

      (For the sake of maximum clarity: I take issue: 1) with the attempt at suggesting the main issue is in clearing caches, not on the leak itself. It doesn't matter how fast you close the barn door after the horse is gone and the barn has burned down. 2) With the blatantly false claim that non-Google caches have been cleared, or were faster to clear than Google's. Cloudflare should know, better than anyone, the massive scope of this leak, and the fact that NO search engine's cache has or could be cleared of this leak. If you find yourself in a situation so bad you feel like you need to misdirect attention to someone else, and it turns out no one else is actually doing anything so you have to like about that...maybe you should just shut up and stop digging?)

      1 reply →

    • > I agree it's troubling that Google is taking so long.

      Google has absolutely no obligation to clean up after your mess.

      You should be grateful for any help they and other search engines give you.

      4 replies →

    • I despise the way you've dealt with this issue with as much dishonesty as you thought you could get away with.

      I will be migrating away from your service first thing Monday. I will not use you services again and will ensure that my clients and colleagues are informed of you horrific business practices now and in the future.

    • For this who haven't been following along, this is the CEO of CloudFlare lying in a way that misrepresents a major problem CloudFlare created. Additionally, they are trying to blame parts of this problem on those that told them about the problem they created.

    • >I'm troubled that they went ahead with disclosure before Google crawl team could complete the refresh of their own cache.

      It sounded like they (cf) were under a lot of pressure to disclose ASAP from project zero and their 7 day requirement...

      3 replies →

    • >> We have continued to escalate this within Google to get the crawl team to prioritize the clearing of their caches as that is the highest priority remaining remediation step.

      If you are using the same attitude as you use in this comment, with their team, i'm pretty sure they will be thrilled to keep aside all their regular work and help you out cleaning up a enormous mess created by a bug in your service.

    • Oh wow, taking a shit on Google after they helped you by reporting a critical flaw in your infrastructure.

      I'm no longer using CF for my own projects, but you've just cemented my decision that none of my clients will either.