← Back to context

Comment by rep_lodsb

3 hours ago

I know this comment will get ignored by the true believers, and likely pasted directly into Claude by the author in order to "further improve" the code, but here's some small excerpts from the terminal emulator (glass.asm, 19360 lines, 555 KiB):

    cmp dword [rax], 'XAUT'
    jne .rxa_next
    cmp dword [rax+4], 'HORI'
    jne .rxa_next
    cmp word [rax+8], 'TY'
    jne .rxa_next
    cmp byte [rax+10], '='
    jne .rxa_next
    ; Found XAUTHORITY=path

Okay, this is setup code that only runs once at startup - but that would be a reason to optimize it for size and/or readability! REPE CMPSB exists, and may not be the fastest, but certainly the most compact and idiomatic way to compare strings. Or write a subroutine to do it!

This pattern is used everywhere for copying or comparing strings, this was just one example of it.

There's a state variable that's used to keep track of whether the input is text to be displayed or part of a control sequence. It's a full 64 bits, probably not because we need 18 quintillion states? Here's how it is evaluated:

    ; Dispatch based on state
    mov rcx, [vt_state]
    cmp rcx, VT_ESC
    je .vtp_esc
    cmp rcx, VT_CSI
    je .vtp_csi
    cmp rcx, VT_CSI_PARAM
    ...

In total, there are 7 compares + conditional jumps, one after another. Compilers would generate a jump table for this, and a better option in assembly might be to make vt_state a pointer to the label we want to go to. Branch predictors nowadays can handle indirect jumps, and may actually have more trouble with such tightly clustered conditionals as seen in this code.

This code is on the "slow" path, there's a faster one for 7-bit ASCII outside of control sequences, with a lengthy comment by Claude at the top on how it optimized this. Even this one starts with a bunch of conditionals though:

    cmp qword [vt_state], 0            ; VT_NORMAL == 0
    jne .vtp_loop_slow
    cmp dword [utf8_remaining], 0
    jne .vtp_loop_slow
    cmp byte [pending_wrap], 0
    jne .vtp_loop_slow

These could likely all be condensed into a single test or indirect jump via the state variable, by introducing just a few more states for UTF-8 decoding and wrap. Following this, here's a "useless use of TEST" (the subtraction already set the flags):

    mov rbx, [grid_cols]
    sub rbx, [cursor_col]              ; rbx = cells left on this row
    test rbx, rbx
    jle .vtp_loop_slow                 ; no room (or already past)

This also again shows the compulsive use of 64-bit registers and variables for values that should never be this big. It's not the "natural" data size on x86-64 at all, every such instruction requires an extra prefix byte.

I freely confess that I'm a "Luddite", and was explicitly looking for bad (and obviously so) code, but this took me just a few minutes of scrolling through the nearly 20K lines in this file, so it should be somewhat representative of the whole.