Comment by lacedeconstruct

1 hour ago

yes but >> 8 is so much faster

You don’t divide a float by 256 by shifting it right eight bits; that would yield complete garbage. You subtract 8 from the exponent, then check if you got an underflow.

Only in micro-benchmarks.

For real usage, today's CPUs are limited by memory bandwidth.

  • What are you talking about in a hot loop in my software renderer this is like 10x faster

        // color4_t result = {
        //     .r = (src.r * src.a + dst.r * inv_alpha) * INV_255,
        //     .g = (src.g * src.a + dst.g * inv_alpha) * INV_255,
        //     .b = (src.b * src.a + dst.b * inv_alpha) * INV_255,
        //     .a = src.a + (dst.a * inv_alpha) * INV_255
        // };
    
        // 1/256 but much faster
        color4_t result = {
            .r = (src.r * src.a + dst.r * inv_alpha) >> 8,
            .g = (src.g * src.a + dst.g * inv_alpha) >> 8,
            .b = (src.b * src.a + dst.b * inv_alpha) >> 8,
            .a = src.a + ((dst.a * inv_alpha) >> 8)
        };