← Back to context

Comment by retrac

3 months ago

Getting a bit off topic, but I feel like this task is something that ought to have special language support.

It's a kind of serialization/deserialization, or what I think Python and some others call "pickling". Same task. Turn these raw bit patterns into typed values.

Ada probably comes closest of the major languages to pulling it off. It has separation of the abstract/programmer's view of a data type and the implementation / low representation of that type.

Specify a bunch of records like:

    for Instruction use record
       Condition at 0 range 31 .. 28;
       ImmFlag at 0 range 27 .. 27;
       Opcode at 0 range 24 .. 21;
       CondFlag at 0 range 20 .. 20;
       Rn at 0 range 19 .. 16;
       Rd at 0 range  15 .. 12;
       Operand at 0 range 11 .. 0;
    end record;

Then aim a pointer at your instructions and read them as records/structs.

It works particularly cleanly with a nice RISC encoding like ARM. I'm not actually sure if that would work in Ada. The use representation syntax might not be generic enough.

If you think Arm is a "nice RISC encoding" then I think you've mostly been looking at the older integer bits of it :-) As you get into FP and SIMD there are just a lot more useful operations that need to fit into the strictly limited encoding space, and new features that need to be tucked into previously unused corners of the space, and it all gets noticeably less regular (e.g. "these two bits encode the operand size which is 0b00/0b01/0b10 for 8/16/32 bits, but 64 bit operands aren't supported and 0b11 means it's part of an entirely different set of instructions").

That sort of approach works for some very simple instruction encodings, but doesn't really handle:

1) instructions which "bend" the format, like ARM instructions such as STMIA or B which combine multiple fields to make a larger immediate value or mask.

2) recognizing instructions which use special values in fields (like ARM condition = 1111) to represent a special instruction.

3) instruction encodings with split fields, like the split immediate in RISC-V S-type instructions.

4) instruction encodings which have too many instruction-specific quirks to fit into any reasonable schema, like 68000.