Comment by TheCondor

21 days ago

The assembler seems like nearly the easiest part. Slurp arch manuals and knock it out, it’s fixed and complete.

11 comments

TheCondor

I am surprised by the number of comments that say the assembler is trivial - it is admittedly perhaps simpler than some other parts of the compiler chain, but it’s not trivial.

What you are doing is kinda serialising a self-referential graph structure of machine code entries that reference each others addresses, but you don’t know the addresses because the (x86) instructions are variable-length, so you can’t know them until you generate the machine code, chicken-and-egg problem.

Personally I find writing parsers much much simpler than writing assemblers.

nicebyte 20 days ago
assembler is far from trivial at least for x86 where there are many possible encodings for a given instruction. emitting the most optimal encoding that does the correct thing depends on surrounding context, and you'd have to do multiple passes over the input.
- jmalicki 19 days ago
  
  What is a single example where the optimal encoding depends on context? (I am assuming you're just doing an assembler where registers have already been chosen, vs. a compiler that can choose sse vs. scalar and do register allocation etc.)?
  
  3 replies →
jmalicki 17 days ago
All you have to do is record a table of fixup locations you can fill in in a second pass once the labels are resolved.
- ndesaulniers 17 days ago
  
  In practice, one of the difficulties in getting _clang_ to assemble the Linux kernel (as opposed to GNU `as` aka GAS), was having clang implement support for "fragments" in more places.
  https://eli.thegreenplace.net/2013/01/03/assembler-relaxatio...
  There were a few cases IIRC around usage of the `.` operator which means something to the effect of "the current point in the program." It can be used in complex expressions, and sometimes resolving those requires multiple passes. So supporting GAS compatible syntax in more than just the basic cases forces the architecture of your assembler to be multi-pass.
- jakewins 16 days ago
  
  I mean, no, it's more than that.
  You also need to choose optimal instruction encoding, and you need to understand how relocs work - which things can you resolve now vs which require you to encode info for the linker to fill in once the program is launched, etc etc.
  Not sure why I'm on this little micro-rant about this; I'm sure Claude could write a workable assembler. I'm more like.. I've written one assembler and many, many parsers, and the parsers where way simpler, yet this thread is littered with people that seem to think assemblers are just lookup tables from ascii to machine code with a loop slapped on top of them.

shakna 21 days ago

Huh. A second person mentioning the assembler. Don't think I ever referred to one...?