Comment by kens
1 day ago
The byte order doesn't make much difference. The more important difference compared to a typical RISC chip is that the 8086 supports unaligned memory access. So there's some complicated bus circuitry to perform two memory accesses and shuffle the bytes if necessary.
To understand why the 8086 uses little-endian, you need to go back to the Datapoint 2200, a 1970 desktop computer / smart terminal built from TTL chips (since this was pre-microprocessor). RAM was too expensive at the time, so the Datapoint 2200 used Intel shift-register memory chips along with a 1-bit serial ALU. To add numbers one bit at a time, you need to start with the lowest bit to handle carries, so little-endian is the practical ordering.
Datapoint talked to Intel and Texas Instruments about replacing the board full of TTL chips with a single-chip processor. Texas Instruments created the TMX1795 processor and Intel slightly later created the 8008 processor. Datapoint rejected both chips and continued using TTL. Texas Instruments tried to sell the TMX1795 to Ford as an engine controller, but they were unsuccessful and the TMX1795 disappeared. Intel, however, marketed the 8008 chip as a general-purpose processor, creating the microprocessor as a product (along with the unrelated 4-bit 4004). Since the 8008 was essentially a clone of the Datapoint 2200 processor, it was little-endian. Intel improved the 8008 with the 8080 and 8085, then made the 16-bit 8086, which led to the modern x86 line. For backward compatibility, Intel kept the little-endian order (along with other influences of the Datapoint 2200). The point of this history is that x86 is little-endian because the Datapoint 2200 was a serial processor, not because little-endian makes sense. (Big-endian is the obvious ordering. Among other things, it is compatible with punch cards where everything is typed left-to-right in the normal way.)
Big-endian matches the way we commonly write numbers, but if you have to deal with multiple word widths or greater than word-width math I find little-endian much more straightforward because LE has the invariant that bit value = 2^bit_index and byte value = 2^(8byte_index).
E.g. a 1 in bit 7 on a LE system always represnts 2^7 for 8/16/32/64/ whatever bit word widths.
This is emphatically not true in BE systems and as evidence I offer that IBM (natively BE), MIPS natively BE) and ARM (natively LE but with a BE mode) all have different mappings of bit and byte indices/lanes in larger word widths* while all LE systems assign the bit/byte lanes the same way.
Using the bit 7 example
- IBM 8-bit: bit 7 is in byte 0 and equal to 2^0
- IBM 16-bit: bit 7 is in byte o and equal to 2^8
- IBM 32-bit: bit 7 is in byte 0 and equal to 2^25
‐ MIPS 16-bit: bit 7 is in byte 1 and equal to 2^7
- MIPS 32-bit: bit 7 is in byte 3 and is equal to 2^7
- ARM 32-bit BE: bit 7 is in byte 0 and is equal to 2^31
Vs. every single LE system, regardless of word width
- bit N is in byte (N//8) and is equal to 2^N
(And of course none of these match how ethernet orders bits/bytes, but that's a different topic)
Not that it typically matters in a practical sense* (Unless you're writing to a register for a device)...
However I've always viewed Little Endian as 'bit 0' being on the left most / lowest part of the string of bits, but Big Endian 'bit 0' is all the way to the right / highest address of bits (but smallest order of power).
If encoding or decoding an analog value it makes sense to begin with the biggest bit first - but that mostly matters in a serial / output sense, not for machine word transfers which are (at least in that era were) parallel (today, of course, we have multiple high speed serial links between most chips, sometimes in parallel for wide paths).
Aside from the reduced complexity of aligned only access, forcing the bus to a machine word naturally also aligns / packs fractions of that word on RISC systems, which tended to be the big endian systems.
From that logical perspective it might even make sense to think of the RAM not in units of bytes but rather in units of whole machine words, which might be partly accessed by a fractional value.