If this implementation had existed in the 1980s, the C standard would have a rule that different tokens hashing to the same 16-bit value invoke undefined behavior, and optimizing compilers in the 2000s would simply optimize such tokens away to a no-op. ;)
This is very nice. I'm currently writing a minimalist C compiler although my goal isn't fitting in a boot sector, it's more targeted at 8-bit systems with a lot more room than that.
This is a great demonstration of how simple the bare bones of C are, which I think is one reason I and many others find it so appealing despite how Spartan it is. C really evolved from B which was a demake of Fortran, if Ken Thompson is to be trusted.
Would and how much would it shrink when if, while, and for were replaced by the simple goto routine? (after all, in assembly there is only jmp and no other fancy jump instruction (I assume) ).
And PS, it's "chose your own adventure". :-)
I love minimalism.
It only does if & while, not for. A goto in a single-pass thing would need separate handling for forwards vs backwards jumps, which involves keeping track of data per name (in a form where you can tell when it's not yet set; whereas if/while data is freely held in recursion stack). And you'd still need to handle at least `if ( expr ) goto foo;` to do any conditionals at all.
What fancy jumps are present in assembly depends on the CPU architecture. But there are always conditional jumps, like JNZ that jumps if the Zero flag isn't set.
An interesting use case - for the compiler as-is or for the essentiall idea of barely-C - might be in bootstrapping chains, i.e. starting from tiny platform-specific binaries one could verify the disassembly of, and gradually building more complex tools, interpreters, and compiler, so that eventually you get to something like a version of GCC and can then build an entire OS distribution.
Oh, it looks like my X86-16 boot sector C compiler that I made recently [1]. Writing boot sector games has a nostalgic magic to it, when programming was actually fun and showed off your skills. It's a shame that the AI era has terribly devalued these projects.
It's a fun comparison, but with the notable difference that that one can compile the Linux kernel and generate code for multiple different architectures, while this one can only compile a small proportion of valid C. It's a great project, but it's not so much a C compiler, as a compiler for a subset of C that allows all programs this compiler can compile to also be compiled by an actual C compiler, but not vice versa.
Nice, now you can dd it to your boot sector and ... Wait, it is 2026, there are 1000 ways of booting and memory mapping on so-called unified ARM architecture @,@
If this implementation had existed in the 1980s, the C standard would have a rule that different tokens hashing to the same 16-bit value invoke undefined behavior, and optimizing compilers in the 2000s would simply optimize such tokens away to a no-op. ;)
"you don't have -wTokenHashCollision enabled! it's your own foolish ignorance that triggered UB; the spec is perfectly clear!"
Too real! LMAO
I may be the author.. enjoy! It was an absolute blast making this!
This is very nice. I'm currently writing a minimalist C compiler although my goal isn't fitting in a boot sector, it's more targeted at 8-bit systems with a lot more room than that.
This is a great demonstration of how simple the bare bones of C are, which I think is one reason I and many others find it so appealing despite how Spartan it is. C really evolved from B which was a demake of Fortran, if Ken Thompson is to be trusted.
Would and how much would it shrink when if, while, and for were replaced by the simple goto routine? (after all, in assembly there is only jmp and no other fancy jump instruction (I assume) ).
And PS, it's "chose your own adventure". :-) I love minimalism.
It only does if & while, not for. A goto in a single-pass thing would need separate handling for forwards vs backwards jumps, which involves keeping track of data per name (in a form where you can tell when it's not yet set; whereas if/while data is freely held in recursion stack). And you'd still need to handle at least `if ( expr ) goto foo;` to do any conditionals at all.
What fancy jumps are present in assembly depends on the CPU architecture. But there are always conditional jumps, like JNZ that jumps if the Zero flag isn't set.
It's "choose your own adventure"
1 reply →
An interesting use case - for the compiler as-is or for the essentiall idea of barely-C - might be in bootstrapping chains, i.e. starting from tiny platform-specific binaries one could verify the disassembly of, and gradually building more complex tools, interpreters, and compiler, so that eventually you get to something like a version of GCC and can then build an entire OS distribution.
Examples:
https://github.com/cosinusoidally/mishmashvm/
and https://github.com/cosinusoidally/tcc_bootstrap_alt/
Related: the stage0/stage1 series of hex-to-c compiler bootstrapping tools https://github.com/oriansj/stage0?tab=readme-ov-file and OTCC https://bellard.org/otcc/
https://bootstrapping.miraheze.org/wiki/Main_Page
(Why does the referenced short story remind me of "There Is No Antimemetics Division"?)
It would be interesting to understand what non-toy programs can be coded in this subset of C. For example, could tcc be rewritten in this dialect?
You may enjoy https://github.com/ludocode/onramp
Oh, it looks like my X86-16 boot sector C compiler that I made recently [1]. Writing boot sector games has a nostalgic magic to it, when programming was actually fun and showed off your skills. It's a shame that the AI era has terribly devalued these projects.
[1] https://github.com/Mati365/ts-c-compiler
> when programming was actually fun and showed off your skills
Oh no. Now more people are able to do what I do. I'm not special anymore.
Seems like this is facetious but to me, “I’m not special” is a pretty valid thing to be sad about.
There seems to be a good amount of interest for a boot sector compiler!!
If you're running on Linux, adjust the qemu call to use alsa rather than coreaudio.
I generated a pull request for this on Github. If the author is happy enough with my verbose shell scripting style :-) it might get included.
Beautiful, but make sure to quickly add 2023 to the title.
Discussed at the time: https://news.ycombinator.com/item?id=36064971
Thanks! Macroexpanded:
SectorC: A C Compiler in 512 bytes - https://news.ycombinator.com/item?id=36064971 - May 2023 (80 comments)
Compare that to the C compiler in 100,000 lines written by Claude in two weeks for $20,000 (I think was posted on HN just yesterday)
It's a fun comparison, but with the notable difference that that one can compile the Linux kernel and generate code for multiple different architectures, while this one can only compile a small proportion of valid C. It's a great project, but it's not so much a C compiler, as a compiler for a subset of C that allows all programs this compiler can compile to also be compiled by an actual C compiler, but not vice versa.
But can it compile "Hello, World" example from its own README.md?
https://github.com/anthropics/claudes-c-compiler/issues/1
12 replies →
Well I'm pretty sure the author can make a compliant C compiler in a few more sectors.
The way hashing is used for tokens and for making a pseudo symbol table is such an elegant idea.
I think the same. Really nice project and good trick with hashing tokens.
PS. There left 21 bytes (21 * 0x00 - from 0x01e0 to 0x01fd). Maybe something can be packed there ;)
This is so cool!
Fun fact, Tiny C Compiler was derived from such a C compiler submitted to the the International Obfuscated C Code Contest.
https://www.ioccc.org/2001/bellard/index.html
Further Fun fact, that submission was called OTCC. I reverse engineered it and that provided inspiration for SectorC.
https://xorvoid.com/otcc_deobfuscated.html https://github.com/xorvoid/otcc_deobfuscated
Meh, I did an entire awk interpreter in two lines:
Nice, now you can dd it to your boot sector and ... Wait, it is 2026, there are 1000 ways of booting and memory mapping on so-called unified ARM architecture @,@
C-subset, to be precise; but microcomputer C compilers were in the tens of KB range, for one that can actually compile real C.
> I wrote a fairly straight-forward and minimalist lexer and it took >150 lines of C code
was it supposed to be "<150"?
They're saying the naive implementation was more than 150 lines of C code (300-450 bytes), i.e. too big.
Why is it called a C Compiler if it's a subset of C?
Reminds me of Allegro SizeHack where we made games in 10KB - but we were using C and Allegro library!
https://www.oocities.org/trentgamblin/sizehack/entries.html#...
Nice. Very K&R-ish. Not a bad thing.
Lacking support for structs, I think this is too minimalistic to be called "a C compiler".
you bootstrap it into a library you can include optionally, duh