← Back to context

Comment by karmakaze

6 years ago

TL;DR

Root cause was a bad regex generating excessive backtracking using all CPU on nodes.

The meta-cause is the process workflow:

> But, by design, the WAF doesn’t use this process because of the need to respond rapidly to threats.

The above is in reference to how WAF deployment doesn't use the graduated DOG(fooding)/(guinea)PIG/canary flow.

> We responded quickly to correct the situation and are correcting the process deficiencies that allowed the outage to occur [...]

Live and learn. Not all WAF deployments are emergency rollouts.