Comment by mpweiher

10 years ago

>Is it your position that all of OS X could run on Squeak, or gcc -O0, with performance comparable to what it does now? >Because that seems to be the logical conclusion of your position.

No, that's not my position, though it is likely true. In fact, in a sense this has already happened: the default optimization switch on OS X is -Os, so optimize for size, not for speed. It turns out that this is usually also OK for raw speed, but it would not matter (much) if it didn't. Memory is more important. With Squeak bytecode being so much more compact, it would probably be an overall win (think cache-pressure for code executed once or twice).

But let’s go with your assumption that we don’t set optimization flags. Much worse regressions happen all the time and people don't notice or don't care. Think virtualization, running code in browswers, the overhead of Dalvik, the introduction of atomic properties in ObjC, or ARC or Swift. How many people noticed the locale switch a couple of years ago that caused grep to become ~100x slower on OSX? How much of the web runs on PHP and Ruby or similarly comically slow tech? We’re spending Moore’s dividend[1] in lots and lots of places, having optimizations turned off in our compilers would be a small fraction of the total.

However, the point is not that you can remove optimizations and everything would simply be the same. The point is that optimizing compilers do more than is needed for most code, and not enough for the rest.

This is not really (or mostly) about what optimizers can or cannot do, it is about the nature of code and the performance requirements of that code. On today’s machines, most of the code isn’t noticeable. Some of the code is highly noticeable, and usually that is in (soft) real time situations. What’s interesting about real time is that speed does not matter. Not missing your deadline does. These are subtly but fundamentally different goals.

I learned this when I was working on pre-press (90ies), and saw newspapers switching to pre-rendered bitmap workflows, where the pages (or page fragements) are rasterized at high dpi (1270dpi is typical for newsprint) early on and then you work with that in all further process steps. This seemed utterly insane to me, because it is the most inefficient way of working possible. Little 100KB Postscript files turned into 50-100MB TIFFs. However, it absolutely makes sense for a newspaper: once a file is rendered, further processing is completely predictable. Postscript (or PDF) on the other hand is orders of magnitude more efficient in not just the best but also the average cases. However, the worst case is essentially unbounded. For a (daily) newspaper, having the pages earlier than needed is nothing, but not having them later than needed is everything, so once you have sized your workflow to handle the (significantly larger) load, you are golden.

The same goes for most desktop and mobile apps. If you’re under 100ms response time and under 16.66ms animation time, you’re good. Being faster does not help. Again, predictability is more important than performance: it is infinitely better to hit 15ms 100% of the time than 1ms 90% and 50ms 10% of the time, even though the latter is more than 2x faster.

Optimizing compilers help with speed, but they do make predicting what code will perform well (or how well) more difficult. You essentially have to have a model of the machine code and access patterns that you are targeting for the platform that you are targeting in your head along with a model of the optimizations your compiler can and does perform, and then write code that will goad your compiler into generating the machine code that you want. Phew! Though admittedly not as tough as doing the same with JITs (I remember the “optimizing for Java Hotspot” session at WWDC back when that was a thing. Much, much easier to just write the damn assembly code yourself!)

DannyBee pointed out a major reason for this: the compiler is bound by the semantics of the language, and maybe more importantly the code as written. Many times, those constraints are much tighter than necessary. The optimizer can’t do a thing because it is not allowed to touch the semantics. The programmer doesn’t know why the compiler isn’t generating the code it should. Maybe the two should talk?

There are exceptions to this, for example Google-scale operations where shaving 1% off some computation, though undetectable by users), means you can run about a million fewer servers and save the energy budget of a small country. But again, I would say that that’s an exceptional case.

[1] Spending Moore’s Dividend, http://cacm.acm.org/magazines/2009/5/24648-spending-moores-d...

0 comments

mpweiher

No comments yet

Contribute on Hacker News ↗