Comment by fjfaase

19 days ago

The code of TCC (0.9.26) is kind hard to compile, I have discovered in the past year, while developing a minimal C compiler to compile the TCC sources [1]. For that reason, I have concluded that TCC is its own test set. It uses the constant 0x80000000, which is an edge case for if you want to print it as a signed integer only using 32-bit operators. There is a switch statement with an post-increment operator in the switch expression. There are also switch statements with fall throughs and with goto statements in the cases. It uses the ## operator where the result is the name of a macro. Just to name a few.

[1] https://github.com/FransFaase/MES-replacement

You have simply made one tiny step, that the guys who used AI and $25,000 to write a C compiler in Rust, could not make:

You are using the compiler to compile itself.

"TCC is its own test set." Absolutely brilliant.

  • Back in the 90s gcc did a three-stage build to isolate the result from weakness in the vendor native compiler (so, vendor builds gcc0, gcc0 builds gcc1, gcc1 builds gcc2 - and you compare gcc2 to gcc1 to look for problems.) It was popularly considered a "self test suite" until someone did some actual profiling and concluded that gcc only needed about 20% of gcc to compile itself :-)

To be honest, these all seem like pretty basic features.

Goto is easier to implement than an if statement. Postincrement behaves no differently in a switch statement than elsewhere.

  • Yes, you are right that a post-increment in a switch statement is no differently than elsewhere. The goal I had set was to implement a small easy to read C compiler. For that reason I tried to implement it as a single pass compiler that would generate code on the fly. The target was a small stack based language, which did support variable scoping and gotos, but not a switch. My first attempt was to implement the switch statement with chained if-statements where the switch expression would be evaluate over and over again. This only works if the switch expression did not have side effects and that the 'default' case would always come at the end. But that did not work, so I had to come up with another solution, a solution that would only evaluate the switch expression once. I decided to store the value on the stack and duplicate the value whenever needed for comparison. But that would require the value to be popped once a case was selected. A goto jumping from one case to another should land after the location where the value is popped, otherwise it would corrupt the stack. I fear that this solution does not work correctly when a case occurs within a for, while, do-loop, or if-statement. cases may occur everywhere in the enclosed code. This is sometimes used to emulate co-routines or generator functions.

    Did you know that C has a keyword that is only a keyword in some places and that there is a function that can have two or three parameters?

    • Maybe the strategy from TCC itself is useful here: emit case bodies as a giant block first then jump back with cascaded if's at end. https://godbolt.org/z/TdE11jjxb

      > Did you know that C has a keyword that is only a keyword in some places and that there is a function that can have two or three parameters?

      What are those? Please tell!

      1 reply →