← Back to context

Comment by ww520

4 days ago

This is very cool. Extremely fast lexical tokenizer is the basis for a fast compiler. Zig has good integration and support for SIMD operations that's perfect for this kind of things. It's definitely doable. I did a proof of concept on using SIMD to operate on 32-byte chunk to parse identifiers a while back.

https://github.com/williamw520/misc_zig/blob/main/identifier...

When I run a profiler on a compiler I wrote (which parses at somewhere between 500K-1MM lines per second without a separate lexer), parsing barely shows up. I'd be very surprised if the zig compiler is spending more than 5% of the time tokenizing.

I assume there is some other use case that is motivating this work.

  • I imagine it would be quite useful for building a responsive language server, where parsing is a more significant portion of the work

    • No, the problem for a language server is incremental performance, not batch performance. Although there are a lot of bad implementations out there that just reparse the entire buffer on each edit (without the error recovery benefits an incremental parser would give you).

      5 replies →