Comment by pslam

13 years ago

CPUs of this era were normally multi-cycle for every instruction, but I never expected in the Z-80 at least one cycle was because the ALU was only 4 bit. Love the detailed analysis - and this is just the tip of the iceberg of that site.

One thing I'm missing from this article is an approximate gate count. Obviously going 4 bit was motivated by gate and area saving, but halving the ALU size isn't going to halve the gate count or area, because it still needs the same width bus and extra latches for the partial answer. Or was it critical path? What kind of saving was it from an 8 bit ALU?

I don't have a gate count (yet). You're right that the 4-bit ALU doesn't save a lot of space overall. The Z-80 designer talks a bit about the 4-bit ALU [1] but doesn't really explain the motivation. My guess was he was able to use two cycles for the ALU without increasing the overall cycle count because memory cycles were the bottleneck. If you can cut the ALU in half "for free", why not? Hopefully as I continue analyzing the chip this will become clearer.

[1] See page 10 in http://archive.computerhistory.org/resources/access/text/Ora...

Note: if you're interested in Z-80 architecture, you seriously should read that link.

  • One of the reasons I was told was that the circuit extended to 16 bits easily (and was later used in the Z8000 as I recall) and doing decimal (BCD) math was easier. DAA (decimal adjust accumulator) was driven by the half carry flag. In '85 Intel wrote a Z80 emulator in 8086 machine code to try to win some Japanese game console design win and the decimal arithmetic stuff[1] was a PITA (and as it turned out not used a lot in games :-)

    [1] The 8080 also had these decimal arithmetic hacks but it didn't have an alternate set of registers to pull from.

    • Thanks for the interesting information. I'm skeptical that the Z-80 designers were planning ahead for 16 bits, though. Simpler BCD math is a possibility - I'll look into this as I examine the Z-80 more. The 6502 wins, though, for crazy but efficient decimal arithmetic - it has a complex patented circuit that detects decimal carry in parallel with the addition/subtraction, and another circuit to add the correction factor to the result without going through the ALU again. So you don't need a separate DAA instruction or additional cycles for decimal correction.

      General question: what things about the Z-80 would you guys like me to write about? Any particular features of the chip? Register-level architecture, gates, or the silicon? Analyzing instructions do cycle by cycle? Gate counts by category? Comparison with other microprocessors?

      2 replies →

    • Whether to provide BCD optimisation always seemed to be a tricky engineering decision; virtually nobody used the 6502 BCD instructions in the amateur home microcomputer environment I was familiar with in the 80s, but it was clearly considered to be important to the CPU manufacturers. Were there BCD benchmarks back then? Was it considered a killer feature to make financial software easier to write? Did Rockwell ever capitalise on that patent?

      10 replies →

I think it started life as a 4 bit processor to compete with the 4004 and went out the door as 8 after the 8080 so they just muxed what was already there.

I don't know that, I just think that. :-)

Indeed .. in all the years I spent programming the Z80, I never for a minute suspected it had this ALU architecture. It's not even mentioned in Rodney Zaks's great work.

I remember the Z80 felt distinctively more sluggish than the 6502 (I had an Apple II with a Z-80 Softcard in it so it could run CP/M). Now I know why.

  • No you don't, this was a clever optimization not a performance degrading hack. It was possible to save half the ALU transistors "for free" so the designer did. The for free bit is important. The Z80 ran a superset of the 8085 instruction set at equal or greater speed, but the 8085 had an 8 bit ALU.

    • The minimum time of an instruction to execute on the 6502 was 2 clock cycles. The maximum is 7, IIRC. On the Z80, it's 4, with the maximum being about 30.

      This has, of course, little to do with the width of the ALU.