Comment by fajitaforce5
6 hours ago
I was an intel cpu architect when transmeta started making claims. We were baffled by those claims. We were pushing the limit of our pipelines to get incremental gains and they were claiming to beat a dedicated arch on the fly! None of their claims made sense to ANYONE with a shred of cpu arch experience. I think your summary has rose colored lenses, or reflects the layman’s perspective.
I think this is a classic hill-climbing dilemma. If you start in the same place, and one org has worked very hard and spent a lot of money optimizing the system, they will probably come out on top. But if you start in a different place, reimagining the problem from first principles, you may or may not find yourself with a taller hill to climb. Decisions made very early on in your hill-climbing process lock you in to a path, and then the people tasked with optimizing the system later can't fight the organizational inertia to backtrack and pick a different path. But a new startup can.
It's worth noting that Google actually did succeed with a wildly different architecture a couple years later. They figured "Well, if CPU performance is hitting a wall - why use just one CPU? Why not put together thousands of commodity CPUs that individually are not that powerful, and then use software to distribute workloads across those CPUs?" And the obvious objection to that is "If we did that, it won't be compatible with all the products out there that depend upon x86 binary compatibility", and Google's response was the ultimate in hubris: "Well we'll just build new products then, ones that are bigger and better than the whole industry." Miraculously it worked, and made a multi-trillion-dollar company (multiple multi-trillion-dollar companies, if you now consider how AWS, Facebook, TSMC, and NVidia revenue depends upon the cloud).
Transmeta's mistake was that they didn't re-examine enough assumptions. They assumed they were building a CPU rather than an industry. If they'd backed up even farther they would've found that there actually was fertile territory there.
> It's worth noting that Google actually did succeed with a wildly different architecture a couple years later. They figured "Well, if CPU performance is hitting a wall - why use just one CPU? Why not put together thousands of commodity CPUs that individually are not that powerful, and then use software to distribute workloads across those CPUs?" And the obvious objection to that is "If we did that, it won't be compatible with all the products out there that depend upon x86 binary compatibility", and Google's response was the ultimate in hubris: "Well we'll just build new products then, ones that are bigger and better than the whole industry." Miraculously it worked, and made a multi-trillion-dollar company (multiple multi-trillion-dollar companies, if you now consider how AWS, Facebook, TSMC, and NVidia revenue depends upon the cloud).
Except "the cloud" at that point was specifically just a large number of normal desktop-architecture machines. Specifically not a new ISA or machine type, running entirely normal OS and libraries. At no point did Google or Amazon or Microsoft make people port/rewrite all of their software for cloud deployment.
At the point that Google's "bunch of cheap computers" was new, CPU performance was still rapidly improving. The competition was traditional "big iron" or mainframe systems, and the novelty was in achieving high reliability through distribution, rather than building on fault-tolerant hardware. By the time the rate of CPU performance improvement was slowing in the mid 2000s, large clusters smaller machines were omnipresent in supercomputing and HPC applications.
The real "new architecture(s)" of this century are GPUs, but much of the development and success of them is the result of many iterations and a lot of convergent evolution.
> At the point that Google's "bunch of cheap computers" was new
It wasn't even new, people just don't know the history. Inktomi and HotBot were based on a fleet of commodity PC servers with low reliability, whereas other large web properties of the time were buying big iron like Sun E10K. And of course Beowulf clusters were a thing.
And as far as I know, google's early ethos didn't come as some far sighted strategy, but just the practical reality of Page and Brin building the first versions of their search engine on borrowed/scavenged hardware as grad students and then continuing that trajectory.
That’s revisionist. Transmeta set out to write a software like cpu core. That will always lose to dedicated hardware.
> Well we'll just build new products then, ones that are bigger and better than the whole industry.
With blackjack, and hookers!
I recall one of the biggest concerns around the time was that OOOE techniques would not continue scaling in width or depth, and that other techniques would be needed. This turned out to be true, but it was not some fringe idea -- the entire industry turned on this. Intel designed the narrow and less "brainy" Pentium 4 and hoped to achieve performance with frequency, and with HP they designed the in-order Itanium lines. AMD did some speed demon K9. IBM did the in-order POWER6 that got performance with high frequency and runahead speculative execution. Nvidia did a similar thing to Transmeta too, quite a while later IIRC.
All failures. Everybody went back to more conventional out of order designs and were able to find ways to keep scaling those.
I'm sure there were some people at all these companies who were always OOOE proponents and disagreed with these other approaches, but I think your summary has poop colored lenses :) It's a little uncharitable to say their ideas were nonsense. The reality is that this was a very uncertain and exploratory time, and many people with large shreds of cpu arch experience all did wildly different things, and many went down the wrong roads (with hindsight).
Even the people on comp.arch at the time were baffled. No one believed it.
The discussions on comp.arch from that era are a gold mine. There were lead architects from the P4 team, from the Alpha team, Linus himself during his Transmeta days... all talking very frankly about the concerns of computer architecture at the time.
The Itanium felt like Intel trying on the same bet - move the speculative and analysis logic into the compiler and off the CPU. But where it differed is that it tried to leave some internal implementation details of that decoding process exposed so the compiler could call it directly, in a way that transmeta didn’t manage.
I wonder how long before we try it again.
Wasn't Intel trying to do something similar in Itanium i.e. use software to translate code into VLIW instructions to exploit many parallel execution units? Only they wanted the C++ compiler to do it rather than a dynamic recompiler? At least some people in Intel thought that was a good idea.
I wonder if the x86 teams at Intel people were similarly baffled by that.
Itanium wasn’t really focusing on running x86 code. Intel wanted native Itanium software, and x86 execution was a bonus.
It was risky.
From my perspective it was more exciting to the programming systems and compiler community than to the computer architecture community.
what are you doing now? Retired on a farm?