← Back to context

Comment by t0b1

2 years ago

The bin packing will probably make it slower though, especially in the bool case since it will create dependency chains. For bools on x64, I don‘t think there‘s a better way than first having to get them in a register, shift them and then OR them into the result. The simple way creates a dependency chain of length 64 (which should also incur a 64 cycle penalty) but you might be able to do 6 (more like 12 realistically) cycles. But then again, where do these 64 bools come from? There aren‘t that many registers so you will have to reload them from the stack. Maybe the rust ABI already packs bools in structs this tightly so it‘s work that has to be done anyway but I don‘t know too much about it.

And then the caller will have to unpack everything again. It might be easier to just teach the compiler to spill values into the result space on the stack (in cases the IR doesn‘t already store the result after the computation) which will likely also perform better.

Unpacking bools is cheap - to move any bit into a flag is just a single 'test' instruction, which is as good as it gets if you have multiple bools (other than passing each in a separate flag, which is quite undesirable).

Doing the packing in a tree fashion to reduce latency is trivial, and store→load latency isn't free either depending on the microarchitecture (and at the counts where log2(n) latency becomes significant you'll be at IPC limit anyway). Packing vs store should end up at roughly the same instruction counts too - a store vs an 'or', and exact same amount of moving between flags ang GPRs.

Reaching 64 bools might be a bit crazy, but 4-8 seems reasonably attainable from each of many arguments being an Option<T>, where the packing would reduce needed register/stack slot count by ~2.

Where possible it would of course make sense to pass values in separate registers instead of in one, but when the alternative is spilling to stack, packing is still worthy of consideration.

  • > Reaching 64 bools might be a bit crazy, but 4-8 seems reasonably attainable from each of many arguments being an Option<T>, where the packing would reduce needed register/stack slot count by ~2

    I don't have a strong sense of how much more common owned `Option` types are than references, but it's worth noting that if `T` is a reference, `Option<T>` will just use a pointer and treat the null value as `None` under the hood to avoid needing any tag. There are probably other types where this is done as well (maybe `NonZero` integer types?)

    • Rust has a thing called the Guaranteed Niche Optimisation, which says if you make a Sum type, and the Sum type has exactly one variant which is just itself, plus exactly one variant which has a niche (a bit pattern which isn't used by any valid representation of that type) then it promises that your type is the same size as the type with the niche in it.

      That is, if you made your own Maybe type which works like Option, it's also guaranteed to get this optimisation, and the optimisation works for any type which the compiler knows has a "niche", not just obvious things like references or small enumerations, the NonZero type, but also e.g. OwnedFd, a type which is a Unix file descriptor - Unix file descriptors cannot be -1, and so logically the bit pattern for -1 serves as a niche for this type.

      I really like this feature, and I want to use it more. There's good news and bad news. The good news is that although the Guaranteed Niche Optimisation is the only such guarantee, in practice the Rust compiler will do much more with a niche.

      The bad news is that we're not allowed to make new types with their own niches (other than enumerations which automatically get an appropriately sized niche) in stable Rust today. In fact the ability to mark a niche is not only permanently unstable (thus usable in practice only from the Rust stdlib) but it's a compiler internal feature, they're pretty much telling you not to touch this, it can't and won't get stabilized in this form)

      But, we do have a good number of useful niches in the standard library, all references, the NonNull pointers (if you use pointers for something), the NonZero types, the booleans, small C-style enumerations, OwnedFd, that's quite a lot of possibilities.

      The main thing I want, and the reason I tried to make more movement happen (but I have done very little for about a year) is BalancedIx a suite of types like NonZero, but missing the most negative values of the signed integers. You very rarely need -128 on an 8-bit signed integer, and it's kind of a nuisance, so BalancedI8 would be the same size, it loses -128 and in exchange Option<BalancedI8> is the same size and now abs does what you expected, two for the price!

    • Yeah, `NonZero*` but also a type like `#[repr(u8)] enum Foo{ X }`, according to `assert_eq!(std::mem::size_of::<Option<Foo>(), std::mem::size_of::<Foo>())` you need an enum which fully saturates the repr, e.g. `#[repr(u8)]Bar { X0, ... X255}` (pseudo code) before niche optimization fails to kick in.

      1 reply →