Comment by tom_

13 days ago

Interesting, thanks. I'd missed that particular detail, possibly because I used to do this stupid shit on the Atari ST and its instructions were quantized to the nearest nop (and so 6 cycles wasn't really a thing). Address register operations are always longs, and clearly the sign extension imposes some overhead. Given that pretty much every other long operation is slower, I imagine this is a case of getting lucky with the timing of the 16-bit internal operations.

ADDQ and ADDX are better instructions to look at, as are any with a Dn,Dn addressing mode. The long and word cases are the same number of instruction bytes, but the long case is still slower.

(Register-to-register moves are the same regardless of width, so presumably it has a 32 bit path for this. That's nice. But not as nice as it would be if it had a 32 bit path for everything. Which it really looks like it doesn't. This CPU has registers, but that can't save it.)