← Back to context

Comment by hansvm

1 day ago

Old habits :)

If I had to steel-man the idea, I'm pretty sure the integer-based solution has better codegen with many kinds of sparse, comptime-known masks. I think you're right though, StaticBitSet looks better.

For your specific case, even a simple `[9][9]u16` might perform better (where you make use of nine bits in each u16). For each entry, the nine mask bits would be in the same bit positions, so the compiler won't have to do a bunch of shifts to extract/align the bits. CPUs love consistency. I doubt it's worth the additional codegen complexity to save 70 bytes in your data model.

  • My initial implementation used [9][9]u9 (which desugars to [9][9]u16 with some zero bits) and was a fair bit slower. If I had to guess, it's because the shift/extract/align you're describing isn't actually a part of the core solving algorithm, and when you have box constraints, knights-move constraints, etc, you're usually not doing anything which fits in a single u16.