This is very cool. Extremely fast lexical tokenizer is the basis for a fast compiler. Zig has good integration and support for SIMD operations that's perfect for this kind of things. It's definitely doable. I did a proof of concept on using SIMD to operate on 32-byte chunk to parse identifiers a while back.
When I run a profiler on a compiler I wrote (which parses at somewhere between 500K-1MM lines per second without a separate lexer), parsing barely shows up. I'd be very surprised if the zig compiler is spending more than 5% of the time tokenizing.
I assume there is some other use case that is motivating this work.
The talks that Niles gave at the Utah Zig meetups (linked in the repo) were great, just wished the AV setup was a little smoother. There seemed like there some really neat visualizations that Niles prepared that flopped. Either way, I recommend it. Inspired me to read a lot more machine code these days.
I wonder if there's a way to make this set of techniques less brittle and more applicable to any language. I guess you're looking at a new backend or some enhancements to one of the parser generator tools.
I have applied a subset of these techniques in a tokenizer in C++ to parse a language syntactically similar to Swift: no inline assembly, no intrinsics, no SWAR but reduce branching, cache optimization and SIMD parsing + explicit vectorization.
I get:
- ~4 MLOC/sec/core on a laptop
- ~ 8-9MLOC/sec/core on a modern AMD sever grade CPU with AVX512.
I guess they are too tailored to the actual memory layout with respective memory access delay of the architecture, but I would like to be shown that I am wrong and it is feasible.
This is very cool. Extremely fast lexical tokenizer is the basis for a fast compiler. Zig has good integration and support for SIMD operations that's perfect for this kind of things. It's definitely doable. I did a proof of concept on using SIMD to operate on 32-byte chunk to parse identifiers a while back.
https://github.com/williamw520/misc_zig/blob/main/identifier...
When I run a profiler on a compiler I wrote (which parses at somewhere between 500K-1MM lines per second without a separate lexer), parsing barely shows up. I'd be very surprised if the zig compiler is spending more than 5% of the time tokenizing.
I assume there is some other use case that is motivating this work.
I imagine it would be quite useful for building a responsive language server, where parsing is a more significant portion of the work
6 replies →
The talks that Niles gave at the Utah Zig meetups (linked in the repo) were great, just wished the AV setup was a little smoother. There seemed like there some really neat visualizations that Niles prepared that flopped. Either way, I recommend it. Inspired me to read a lot more machine code these days.
Very interesting project!
I wonder if there's a way to make this set of techniques less brittle and more applicable to any language. I guess you're looking at a new backend or some enhancements to one of the parser generator tools.
I have applied a subset of these techniques in a tokenizer in C++ to parse a language syntactically similar to Swift: no inline assembly, no intrinsics, no SWAR but reduce branching, cache optimization and SIMD parsing + explicit vectorization.
I get:
- ~4 MLOC/sec/core on a laptop
- ~ 8-9MLOC/sec/core on a modern AMD sever grade CPU with AVX512.
So yes, it is definitively possible.
Would be very cool, if once finished, the techniques are applied to user-schedulable languages https://www.hytradboi.com/2025/7d2e91c8-aced-415d-b993-f6f85....
I guess they are too tailored to the actual memory layout with respective memory access delay of the architecture, but I would like to be shown that I am wrong and it is feasible.
This really moves Zig