Comment by mathisfun123

18 hours ago

That's literally only for 32bx24b (I don't remember why we did that specifically for CDNA - I'll ask someone) but as you see from V_MUL_HI_I32, V_MUL_LO_U32 there is very much vector arithmetic hardware (nevermind that we're not talking about VALU but conventional scalar ALU).

1 comment

mathisfun123

imtringued 11 hours ago

I think he has a point, but I am still not 100% convinced by the arguments relating to casting.

There is a difference between a u24 data type inside u32 and a u24 datatype inside u24 and that is what's so frustrating here. u24 is an alignment nightmare so it will basically never exist as "u24 in u24" and only ever as "u24 in u32".

For casting to make sense, the alignment must be compatible and it's not clear how you can simultaneously make arbitrary bit data types simultaneously useful for the scenario of describing bit fields in packets, where padding is inherently undesirable and performing integer arithmetic with an FPU, where padding is an acceptable cost for alignment. These appear to be mutually exclusive use cases.