Comment by ieviev
15 hours ago
Yes, that's exactly what we did to be competitive in the benchmarks.
There's a lot of simple cases where you don't really need a regex engine at all.
integrating SearchValues as a multi-string prefix search is a bit harder since it doesn't expose which branch matched so we would be taking unnecessary steps.
Also .NET implementation of Hyperscan's Teddy algorithm only goes left to right.. if it went right to left it would make RE# much faster for these cases.
So, there's still room for significant improvement.
There is plenty still to do.
One part of this is SIMD algorithms to better compete with Hyperscan/Rust, another is the decades of optimizations that backtracking engines have for short anchored matches for validation.
There's analysis to do for specific patterns so we can opt for specialized algorithms, eg. for fixed length patterns we skip the left-to-right pass entirely since we already know the match start + match length.
Lots of opportunistic things like this which we haven't done. Also there are no statistical optimizations in the engine right now. Most engines will immediately start looking for a 'z' if there is one in the pattern since it is rare.