Comment by jgrahamc
6 years ago
This report is written by me, the CTO of Cloudflare. I say "I" throughout because organizational failings are my responsibilty. If I'd said "we" I imagine you'd be criticizing me for NOT taking responsibility.
If you read the report you'd see I do not blame the engineer responsible at all. Not once. I made that perfectly clear.
I wonder if you are able to talk a bit about the development of the Lua-based WAF. I imagine the possible unbounded performance of feeding requests into PCRE must have occurred to you or others at the time - or at least, long before this outage.
I don't mean this as some sort of lame 'lol shoulda known better' dunk - stories about technical organizations' decision-making and tradeoff-handling are just more interesting than the details of how regexes typed in a control panel grow up to become Jira tickets.
I did a talk about this years ago: https://www.youtube.com/watch?v=nlt4XKhucS4
It sounds like one of the primary factors was compatibility with existing (or customer-provided) mod_security rules, if I've understood 1.75x speed hyper-you right.