← Back to context

Comment by ethin

3 months ago

Did you or anyone else figure out how to work with the Aarch64/ARM32 open-source JSON schemas? I could never figure them out and I feel like if I wanted to work with them I'd have to manually write Pydantic models for them or something, because Quicktype chokes on them. Mainly because the schemas are recursive. If there's one thing I like about RISC-V, it's that their riscv-opcodes instruction dictionaries are trivial to work with (although it's tricky for me to auto-generate "this operand is signed, this one isn't" logic since the repo currently doesn't convey that information).

I haven't looked at them, partly because I'm sceptical about how much use they are. In my experience the hard part of emulating new instructions is the semantics, the "what does this instruction actually do?" bit. The line in the decode file that specifies the 1s and 0s and fields is trivial and takes no time. Plus, the way the architecture splits things up often doesn't match the way that makes the emulation simpler: for instance "same operation, but comes in signed and unsigned flavours" is often encoded with a sign bit but appears in the architecture as two separate instructions, one with an S in the name and one with a U. It's often simpler to have one decode line which covers both and passes signedness as an argument, where something auto-generated from the architectual data would give you two distinct functions.

For cases where everything you need is really in the datafiles (e.g. a simple disassembler) or where you're providing a user facing API that you want to have match the architecture documentation closely, the tradeoffs are different.

Also for QEMU I tend to value "works the same regardless of target architecture" over "we can do a clever thing for this one case but all the others will be different".

So, it is possible to use the machine consumable forms of the ISA, but realistically you'll spend way longer fighting it than finding other strategies.

Years ago, I ended up creating my own aarch64 definition, which I use to generate disassemblers, interpreters, and recompilers (dynamic and static) automatically: https://github.com/daeken/SharpRetro/blob/main/Aarch64Genera...

It doesn't have perfect support, but it has served as an incredibly useful resource. I've since generalized it to work for other architectures, and that same repo has definitions for MIPS (specifically the PSX CPU), DMG, and the groundwork for an x86 core. The goal is to be able to define these once, then generate any future targets automatically.