Comment by peterfirefly

1 day ago

An x86 disassembler is not that hard, as long as you stick to a single mode and ignore the SIMD alphabet soup.

You have a short loop that scans through the prefixes, checks for a REX prefix (if you handle 64-bit mode), reads the opcode (1-3 bytes), reads the MOD/RM byte if there is one (use a table lookup), reads the SIB byte if there is one (table), reads offset if there is one (table), reads immediate if there is one (table).

It's probably easiest if you use an "expanded/normalized" opcode internally so the 1-3 opcode bytes + the 3 extra bits from some MOD/RM bytes + prefix info (for certain SIMD instructions) map to a single 16-bit opcode (likely around a couple hundred to a thousand opcodes in total).

You have a table that maps those to mnemonics + operand info. You have some tables that map 0-7 (or 0-15) to AL/AH/... and AX/BX/CX/... and CS/DS/ES/... and various system registers.

Not that much code all in all. Several tables. You can squeeze them and bit pack them to your heart's content if you want.

Once you have that, a simple assembler isn't so hard.