Comment by torstenvl

2 months ago

I don't think clang is being "aggressive" on ARM, it's just that all aarch64 targets support fma. You'll get similar results with vfmadd213ss on x86-64 with -march=haswell (13 years old at this point, probably a safe bet).

    float fma(float x) {
        return 3.0f * x + 1.0f;
    }

Clang armv8 21.1.0:

    fma(float):
        sub     sp, sp, #16
        str     s0, [sp, #12]
        ldr     s1, [sp, #12]
        fmov    s2, #1.00000000
        fmov    s0, #3.00000000
        fmadd   s0, s0, s1, s2
        add     sp, sp, #16
        ret

Clang x86-64 21.1.0:

    .LCPI0_0:
        .long   0x3f800000
    .LCPI0_1:
        .long   0x40400000
    fma(float):
        push    rbp
        mov     rbp, rsp
        vmovss  dword ptr [rbp - 4], xmm0
        vmovss  xmm1, dword ptr [rbp - 4]
        vmovss  xmm2, dword ptr [rip + .LCPI0_0]
        vmovss  xmm0, dword ptr [rip + .LCPI0_1]
        vfmadd213ss     xmm0, xmm1, xmm2
        pop     rbp
        ret

4 comments

torstenvl

AlotOfReading 2 months ago

The point is that there are multiple, meaningfully different implementations for the same line, not that either is wrong. Sometimes compilers will even produce both implementations and call one or the other based on runtime checks, as this ICC example does:

https://godbolt.org/z/KnErdebM5

torstenvl 2 months ago
I don't understand the argument you're trying to make. You seem to be arguing that C isn't a high-level assembler because some compilers generate different machine code for the same source code. But (a) that doesn't contradict the claim in any way; and (b) this happens in assembly all the time too—some synthetic instructions generate different machine code depending on circumstances.
- AlotOfReading 2 months ago
  
  I'm saying that "C is not a high level assembler" because it doesn't have any of the characteristics that make a good assembler. Your original post made a distinction between C as practically implemented vs the ISO standard, so the example was chosen as a practical example of something an assembler should never do: change the precision and rounding of arithmetic expressions.
  Now let's say you're working on a game with deterministic lockstep. How do you guarantee precision and rounding with an assembler? Well, you just write the instructions or pseudoinstructions that do what you want. Worst case, you write a thin macro to generate the right instructions based on something else that you also control. In C or C++, you either abuse the compiler or rely on a library to do that for you ([0], [1]).
  This is the raison d'etre of modern assemblers: precise control over the instruction stream. C doesn't give you that and it makes a lot of things difficult (e.g. constant time cryptography). It's also not fundamental to language design. There's a long history of lisp assemblers that do give you this kind of precise control, it's just not a guarantee provided by any modern C implementations unless you use the assembly escape hatches. The only portable guarantees you can rely on are those in the standard, hence the original link.
  Low level control over the instruction stream is ultimately a spectrum. On one end you can write entirely in hex, then you have simple and macro assemblers. At the far end you have the high level languages. Somewhere in the middle is C and however you want to categorize FASM.
  [0] https://github.com/sixitbb/sixit-dmath
  [1] https://github.com/J-Montgomery/rfloat
  
  1 reply →