Comment by feldrim

13 hours ago

Would SearchValues<char> help there for a fallback to a SIMD optimized simple string literal search rather than the happy path?

3 comments

feldrim

ieviev 13 hours ago

Yes, that's exactly what we did to be competitive in the benchmarks.

There's a lot of simple cases where you don't really need a regex engine at all.

integrating SearchValues as a multi-string prefix search is a bit harder since it doesn't expose which branch matched so we would be taking unnecessary steps.

Also .NET implementation of Hyperscan's Teddy algorithm only goes left to right.. if it went right to left it would make RE# much faster for these cases.

feldrim 5 hours ago
So, there's still room for significant improvement.
- ieviev 4 hours ago
  
  There is plenty still to do.
  One part of this is SIMD algorithms to better compete with Hyperscan/Rust, another is the decades of optimizations that backtracking engines have for short anchored matches for validation.
  There's analysis to do for specific patterns so we can opt for specialized algorithms, eg. for fixed length patterns we skip the left-to-right pass entirely since we already know the match start + match length.
  Lots of opportunistic things like this which we haven't done. Also there are no statistical optimizations in the engine right now. Most engines will immediately start looking for a 'z' if there is one in the pattern since it is rare.