Comment by feldrim
13 hours ago
Would SearchValues<char> help there for a fallback to a SIMD optimized simple string literal search rather than the happy path?
13 hours ago
Would SearchValues<char> help there for a fallback to a SIMD optimized simple string literal search rather than the happy path?
Yes, that's exactly what we did to be competitive in the benchmarks.
There's a lot of simple cases where you don't really need a regex engine at all.
integrating SearchValues as a multi-string prefix search is a bit harder since it doesn't expose which branch matched so we would be taking unnecessary steps.
Also .NET implementation of Hyperscan's Teddy algorithm only goes left to right.. if it went right to left it would make RE# much faster for these cases.
So, there's still room for significant improvement.
There is plenty still to do.
One part of this is SIMD algorithms to better compete with Hyperscan/Rust, another is the decades of optimizations that backtracking engines have for short anchored matches for validation.
There's analysis to do for specific patterns so we can opt for specialized algorithms, eg. for fixed length patterns we skip the left-to-right pass entirely since we already know the match start + match length.
Lots of opportunistic things like this which we haven't done. Also there are no statistical optimizations in the engine right now. Most engines will immediately start looking for a 'z' if there is one in the pattern since it is rare.