Comment by geir_isene
1 day ago
Less memory footprint. No reliance on libs. Pure first-person control. No wasted CPU cycles is the target here for me. And if you read the post, the asm set is only for the desktop itself. The tools I use are in Rust. Result is: Laptop now runs at between 5-6W (down from ~9W) [XPS14 latest hw] on Ubuntu 26.04 - giving me around 3.5h extra battery life.
My guess is you're likely to waste more cycles on development time, and on suboptimal algorithms because the implementation is harder, than you would waste on rust-related bloat.
Still a cool project, thanks for sharing.
I have wondered about having LLMs output machine code directly and skipping the compiler/assembler altogether. Then you'd just commit your spec/prompt and run it through the LLM to get your binary.
I know this comment will get ignored by the true believers, and likely pasted directly into Claude by the author in order to "further improve" the code, but here's some small excerpts from the terminal emulator (glass.asm, 19360 lines, 555 KiB):
Okay, this is setup code that only runs once at startup - but that would be a reason to optimize it for size and/or readability! REPE CMPSB exists, and may not be the fastest, but certainly the most compact and idiomatic way to compare strings. Or write a subroutine to do it!
This pattern is used everywhere for copying or comparing strings, this was just one example of it.
There's a state variable that's used to keep track of whether the input is text to be displayed or part of a control sequence. It's a full 64 bits, probably not because we need 18 quintillion states? Here's how it is evaluated:
In total, there are 7 compares + conditional jumps, one after another. Compilers would generate a jump table for this, and a better option in assembly might be to make vt_state a pointer to the label we want to go to. Branch predictors nowadays can handle indirect jumps, and may actually have more trouble with such tightly clustered conditionals as seen in this code.
This code is on the "slow" path, there's a faster one for 7-bit ASCII outside of control sequences, with a lengthy comment by Claude at the top on how it optimized this. Even this one starts with a bunch of conditionals though:
These could likely all be condensed into a single test or indirect jump via the state variable, by introducing just a few more states for UTF-8 decoding and wrap. Following this, here's a "useless use of TEST" (the subtraction already set the flags):
This also again shows the compulsive use of 64-bit registers and variables for values that should never be this big. It's not the "natural" data size on x86-64 at all, every such instruction requires an extra prefix byte.
I freely confess that I'm a "Luddite", and was explicitly looking for bad (and obviously so) code, but this took me just a few minutes of scrolling through the nearly 20K lines in this file, so it should be somewhat representative of the whole.
> Less memory footprint. No reliance on libs.
rust can do that. You can run a hyper stripped down rust that was made for embedded devices specifically because those devices don't have room for a runtime.
I'm sure I can. The original challenge was more in line of "I wonder if CC can do this now?"
And it apparently can. And very well.
One advantage seems to be that the complete asm file fits easily into CC context window.
> The original challenge was more in line of "I wonder if CC can do this now?"
well, I can respect that for sure
+3.5h extra battery life is a real measurable result! well done.