← Back to context

Comment by jcalvinowens

10 hours ago

> By the time you joined and benchmarked these systems, the continuous rolling deployment had taken over

Nope, I started in 2014.

> I don't recall ever talking to you on the matter.

I recall. You refused to believe the benchmark results and made me repeat the test, then stopped replying after I did :)

The patches were written in 2011 and published in 2012. They did what they were supposed to at the time.

For the peanut gallery: this is a manifestation of an internal eng culture at fb that I wasn't particularly fond of. Celebrating that "I killed X" and partying about it.

You didn't reply to the main point: did you benchmark a server that was running several days at a time? Reasonable people can disagree about whether this a good deployment strategy or not. I tend to believe that there are many places which want to deploy servers and run for months if not days.

  • For the peanut gallery more: I worked with both of these guys at Meta on this.

    The "servers are only on for a few hours" thing was like never true so I have no idea where that claim is coming from. The web performance test took more than a few hours to run alone and we had way more aggressive soaks for other workloads.

    My recollection was that "write zeroes" just became a cheaper operation between '12 and '14.

    A fun fact to distract from the awkwardness: a lot of the kernel work done in the early days was exceedingly scrappy. The port mapping stuff for memcached UDP before SO_REUSEPORT for example. FB binaries couldn't even run on vanilla linux a lot of the time. Over the next several years we put a TON of effort in getting as close to mainline as possible and now Meta is one of the biggest drivers of Linux development.

    • [ Edit: "servers" in this context meant the HHVM server processes, not the physical server which of course had a longer uptime ]

      People got promoted for continuous deployment

      https://engineering.fb.com/2017/08/31/web/rapid-release-at-m...

      I think it's fair to say the hardware changed, the deployment strategy changed and the patches were no longer relevant, so we stopped applying them.

      When I showed up, there were 100+ patches on top of a 2009 kernel tree. I reduced the size to about 10 or so critical patches, rebased them at a 6 months cadence over 2-3 years. Upstreamed a few.

      Didn't go around saying those old patches were bad ideas and I got rid of them. How you say it matters.

      2 replies →

    • It's not just that zeroing got cheaper, but also we're doing a lot less of it, because jemalloc got much better.

      If the allocator returns a page to the kernel and then immediately asks back for one, it's not doing its job well: the main purpose of the allocator is to cache allocations from the kernel. Those patches are pre-decay, pre-background purging thread; these changes significantly improve how jemalloc holds on to memory that might be needed soon. Instead, the zeroing out patches optimize for the pathological behavior.

      Also, the kernel has since exposed better ways to optimize memory reclamation, like MADV_FREE, which is a "lazy reclaim": the page stays mapped to the process until the kernel actually need it, so if we use it again before that happens, the whole unmapping/mapping is avoided, which saves not only the zeroing cost, but also the TLB shootdown and other costs. And without changing any security boundary. jemalloc can take advantage of this by enabling "muzzy decay".

      However, the drawback is that system-level memory accounting becomes even more fuzzy.

      (hi Alex!)

This is why I love hacker news. I learn so much from these moments.

  • Like "never work at Meta unless you can out-toxic your coworkers".

    • Yea I knew meta was toxic, but publicly beefing over something over a decade ago is a whole other matter. I can’t even remember what I was working on 10 years ago, and even if I did I wouldn’t be bringing people down that much later.

      3 replies →

    • Inside Meta, engineers are one of the kindest group of people.

      This thread would've been way more fun with a couple of middle managers and product managers in the mix ;-)

    • Funny, I was thinking what a relief it was to see people making their arguments frankly like on the HN of 10+ years ago.

    • Like "Hey, I wonder if Conway's Law works both ways. Huh. Wow. It looks like that is indeed the case."