← Back to context

Comment by dudu24

2 hours ago

If you have a ruler and it goes to 12 inches, you should normalize by the length L and not by 13, the number of points on the ruler.

yes but >> 8 is so much faster

  • You don’t divide a float by 256 by shifting it right eight bits; that would yield complete garbage. You subtract 8 from the exponent, then check if you got an underflow.

  • Only in micro-benchmarks.

    For real usage, today's CPUs are limited by memory bandwidth.

    • What are you talking about in a hot loop in my software renderer this is like 10x faster

          // color4_t result = {
          //     .r = (src.r * src.a + dst.r * inv_alpha) * INV_255,
          //     .g = (src.g * src.a + dst.g * inv_alpha) * INV_255,
          //     .b = (src.b * src.a + dst.b * inv_alpha) * INV_255,
          //     .a = src.a + (dst.a * inv_alpha) * INV_255
          // };
      
          // 1/256 but much faster
          color4_t result = {
              .r = (src.r * src.a + dst.r * inv_alpha) >> 8,
              .g = (src.g * src.a + dst.g * inv_alpha) >> 8,
              .b = (src.b * src.a + dst.b * inv_alpha) >> 8,
              .a = src.a + ((dst.a * inv_alpha) >> 8)
          };

      2 replies →