Comment by dzaima
2 years ago
Unpacking bools is cheap - to move any bit into a flag is just a single 'test' instruction, which is as good as it gets if you have multiple bools (other than passing each in a separate flag, which is quite undesirable).
Doing the packing in a tree fashion to reduce latency is trivial, and store→load latency isn't free either depending on the microarchitecture (and at the counts where log2(n) latency becomes significant you'll be at IPC limit anyway). Packing vs store should end up at roughly the same instruction counts too - a store vs an 'or', and exact same amount of moving between flags ang GPRs.
Reaching 64 bools might be a bit crazy, but 4-8 seems reasonably attainable from each of many arguments being an Option<T>, where the packing would reduce needed register/stack slot count by ~2.
Where possible it would of course make sense to pass values in separate registers instead of in one, but when the alternative is spilling to stack, packing is still worthy of consideration.
> Reaching 64 bools might be a bit crazy, but 4-8 seems reasonably attainable from each of many arguments being an Option<T>, where the packing would reduce needed register/stack slot count by ~2
I don't have a strong sense of how much more common owned `Option` types are than references, but it's worth noting that if `T` is a reference, `Option<T>` will just use a pointer and treat the null value as `None` under the hood to avoid needing any tag. There are probably other types where this is done as well (maybe `NonZero` integer types?)
Rust has a thing called the Guaranteed Niche Optimisation, which says if you make a Sum type, and the Sum type has exactly one variant which is just itself, plus exactly one variant which has a niche (a bit pattern which isn't used by any valid representation of that type) then it promises that your type is the same size as the type with the niche in it.
That is, if you made your own Maybe type which works like Option, it's also guaranteed to get this optimisation, and the optimisation works for any type which the compiler knows has a "niche", not just obvious things like references or small enumerations, the NonZero type, but also e.g. OwnedFd, a type which is a Unix file descriptor - Unix file descriptors cannot be -1, and so logically the bit pattern for -1 serves as a niche for this type.
I really like this feature, and I want to use it more. There's good news and bad news. The good news is that although the Guaranteed Niche Optimisation is the only such guarantee, in practice the Rust compiler will do much more with a niche.
The bad news is that we're not allowed to make new types with their own niches (other than enumerations which automatically get an appropriately sized niche) in stable Rust today. In fact the ability to mark a niche is not only permanently unstable (thus usable in practice only from the Rust stdlib) but it's a compiler internal feature, they're pretty much telling you not to touch this, it can't and won't get stabilized in this form)
But, we do have a good number of useful niches in the standard library, all references, the NonNull pointers (if you use pointers for something), the NonZero types, the booleans, small C-style enumerations, OwnedFd, that's quite a lot of possibilities.
The main thing I want, and the reason I tried to make more movement happen (but I have done very little for about a year) is BalancedIx a suite of types like NonZero, but missing the most negative values of the signed integers. You very rarely need -128 on an 8-bit signed integer, and it's kind of a nuisance, so BalancedI8 would be the same size, it loses -128 and in exchange Option<BalancedI8> is the same size and now abs does what you expected, two for the price!
Yeah, `NonZero*` but also a type like `#[repr(u8)] enum Foo{ X }`, according to `assert_eq!(std::mem::size_of::<Option<Foo>(), std::mem::size_of::<Foo>())` you need an enum which fully saturates the repr, e.g. `#[repr(u8)]Bar { X0, ... X255}` (pseudo code) before niche optimization fails to kick in.
Oh, good to know!