← Back to context

Comment by redman25

8 months ago

That's one of the main reasons I'm leary about them. Such a big f-up is difficult to forget. It shows that they have a move fast and break things culture which for a company that is responsible for critical infrastructure feels wrong.

In response to this incident Cloudflare has made big engineering changes, including huge work to move away from C as much as possible.

The offending parsers were rewritten in Rust (https://github.com/cloudflare/lol-html), as well as WAF, image optimization, and a few others. Nginx is being replaced with a custom cache server.

New implementations are using either the Workers platform, or are written in Rust or Golang.

  • Memory safety doesn't fix fundamental design flaws.

    • This is an empty tautology. You have no insight into the actual design, so I presume your fundamental design flaw is the CDN existing.

I interviewed there once and they asked me what I would do if a service broke after a deployment. I said the first step was to revert to the last known good version and then investigate. Color me surprised when that was not the answer they expected.

  • Cloudflare's internal release tool suggests revert when monitoring detects failures during deployment, so this question doesn't describe Cloudflare's practices. There must have been something more to it, or it was a misunderstanding.

  • That's strange. What was the "correct" answer?

    • If I ever interview at Cloudflare and get this question I might answer with "call the sales team and have them fix it by selling someone an enterprise subscription paid upfront by the decade" just to see if the interviewers read Hacker News :P