Comment by lkoolma
6 years ago
Might be late, but has anyone in CloudFlare tried to switch away from regex to something more efficient and powerful? Tools like re2c can convert 100s of regexs and CFG into a single optimized state machine (which includes no back tracking, as far as I remember). It should easily handle 10s of millions transactions per second per core if the complete state machine fits into the CPU level 3 cache (or lower), with a bit of optimization.
There is also Ragel [0], but I think that in this context deploying regexes as strings is safer than generating code and deploying that code (unless Ragel could generate webassembly).
[0]: http://www.colm.net/open-source/ragel/
Ragel has the advantage that CPU blowups happen at compile time, rather than run-time. Other risks aside, they would have avoided this problem had they been using ragel or something similar to pre-compile their patterns into deterministic machines.
Sorry I didn't see the parent you were responding too, so my point is actually the same as you already made. Thanks.
The article says they're going to either switch to RE2 or Rust's regex, both of which use a DFA (a state machine) and have no backtracking.
But you do bring up a good point. RE2 and Rust both compile the regex in the same process that executes it. Compiling the regex as part of your build process then pushing the compiled form could have advantages.