Comment by stingraycharles

2 days ago

It can, because of how CPUs work with registers and hot code paths and all that.

First normalizing everything and then comparing normalized versions isn’t as fast.

And it also enables “stopping early” when a match has been found / not found, you may not actually have to convert everything.

Running more code per unit of data does not make the code hotter or reduce the register pressure, quite the opposite...

  • You’re misunderstanding: you just convert to 32 bits once and reuse that same register all the time.

    You’re running the exact same code, but are more more efficient in terms of “I immediately use the data for comparison after converting it”, which means it’s likely either in a register or L1 cache already.