← Back to context

Comment by Neywiny

6 hours ago

Hmmm disagree on your chain there. Plenty of easy hardware algorithms are hard for software. For example, in hardware (including FPGAs), bit movement/shuffling is borderline trivial if it's constant, while in software you have to shift and mask and or over and over. In hardware you literally just switch which wire is connected to what on the next stage. Same for weird bit widths. Hardware doesn't care (too much) if you're operating on 9 bit quantities or 33 or 65. Software isn't that granular and often you'll double your storage and waste a bunch.

I think they certainly go hand in hand in that algorithms relatively easier for software vs previously are easier for hardware vs previously and vice versa, but they are good at different things.

I'm not claiming that software will be more efficient. I'm claiming that things that make it easy to go fast in hardware make it easy to go fast in software.

Bit masking/shifting is certainly more expensive in software, but it's also about the cheapest software operation. In most cases it's a single cycle transform. In the best cases, it's something that can be done with some type of SIMD instruction. And in even better cases, it's a repeated operation which can be distributed across the array of GPU vector processors.

What kills both hardware and software performance is data dependency and conditional logic. That's the sort of thing that was limited in the AV1 stream.