← Back to context

Comment by userbinator

3 years ago

I saw the repeating 'A' at the end of the base64 text and thought "it's not even 512 bytes; it's smaller!"

That said, the title is just a little clickbaity --- it's a C-subset compiler, and more accurately a JIT interpreter. There also appears to be no attempt at operator precedence. Nonetheless, it's still an impressive technical achievement and shows the value of questioning common assumptions.

Finally, I feel tempted to offer a small size optimisation:

    sub ax,2

is 3 bytes whereas

    dec ax
    dec ax

is 2 bytes.

You may be able to use single-byte xchg's with ax instead of movs, and the other thing which helps code density a lot in 16-bit code is to take advantage of the addressing modes and LEA to do 3-operand add immediates where possible.

Good tip! Yeah, there’s ~20 bytes unused at the end. I kept finding ways to squeeze out a few more and had to tell myself to stop and just publish it already. You could take this further if you really wanted. But it’s already sufficiently absurd.

C has a bit of a history with interpreters and reduced implementations (not to devalue SectorC which is absolutely cool). I'm thinking Small C Interpreter for CP/M: http://www.cpm.z80.de/small_c.html an interpreted version of https://en.wikipedia.org/wiki/Small-C

  • The book that had the Cain/Hendrix "Small C" compiler, runtime library, assembler and tools was a fantastic resource that taught me C and compiler construction at the same time.

    In general, reading (lots of) source code is a good way to learn how to do things in a new language, i.e. to move from the lexical and syntactical level to the level of idioms for problem-solving. On reflection, I find it strange that in programming teaching, larger pieces of existing well-written source code are never discussed/explained/critiqued.