← Back to context

Comment by mpweiher

10 years ago

Maybe you should have continued to the end, because as far as I can tell, you are almost completely missing the point.

You are saying "look at all the wonderful things optimizing compilers can do". And he is saying "that's cool, but it doesn't really matter".

Now I am pretty sure he wold concede immediately that there are some examples where it matters, but my experience matches what he is saying very closely.

A little bit on my background: hired in 2003 by the BBC for the express purpose of making one of their systems faster, succeeded at around 100x - 1000x with simpler code[1], hired by Apple to work on their Mac OS X performance team, similar successes with Spotlight, Mail indexing, PDF, etc. Also worked with/on Squeak. Squeak runs a bytecode interpreter, with a bunch of primitives. The interpreter is perfectly adequate for the vast majority of the system. Heck, the central page imposition routine in my BookLightning PDF page imposition program[2] is written in Objective-Smalltalk[3], or more precisely an interpreter for the language that's probably the slowest computer language implementation currently in existence. Yes it's slower than Ruby, by a wide margin. And yet BookLightning runs rings around similar programs that are based on Apple's highly optimized Quartz PDF engine. And by "rings" I mean order(s) of magnitude.

Why is BookLightning so fast? Simple: it knows that it doesn't have to unpack the individual PDF pages to impose them, and is built on top of a PDF engine that allows it to do that[4]. The benefit of an optimizing compiler in this scenario? Zilch.

At the BBC, the key insight was to move from 20+ machine distributed and SQL database backed system to a single jar event logger working in-memory with a filesystem based log[5]. How would a compiler make this transformation? And after the transformation, we probably could have run interpreted byte-code and still be fast enough, though the JIT probably helped us a little bit in not having to worry about performance of the few algorithmic parts.

As to changing a bubblesort to a mergesort, you hand-wave around that, but I think the point is that this is not a safe transformation, because the author may have specifically chosen bubblesort because he likes its characteristics for the data he knows will be encountered (or cares more about that case).

When I worked at Apple, I saw the same pattern: low-level optimizations of the type you discuss generally gained a few percent here and there, and we were happy to get them in the context of machine/system wide optimizations. However, the application-specific optimizations that were possible when you consider the whole stack and opportunities for crossing or collapsing layer boundaries were a few factors x or orders of magnitude.

And here is where the "optimizing compiler" mindset actually starts to get downright harmful for performance: in order for the compiler to be allowed to do a better job, you typically need to nail down the semantics much tighter, giving less leeway for application programmers. So you make the automatic %-optimizations that don't really matter easier by reducing or eliminating opportunities for the order-of-magnitude improvements.

And yes, I totally agree that optimizations should be user-visible and controllable, rather than automagically applied by the compiler. Again, the opportunities are much bigger for the code that matters, and the stuff that is applied automatically doesn't matter for most of the code it is applied to.

[1] http://link.springer.com/chapter/10.1007%2F978-1-4614-9299-3...

[2] http://www.metaobject.com/Products/

[3] http://objective.st

[4] http://www.metaobject.com/Technology/

[5] http://martinfowler.com/bliki/EventPoster.html

Is it your position that all of OS X could run on Squeak, or gcc -O0, with performance comparable to what it does now?

Because that seems to be the logical conclusion of your position.

Yes, it may be that low-level optimizations are fighting for single-digit percentage improvements. And of course re-architecting big systems at a high level can yield much bigger gains. But you have to remember that the baseline is an entire system that's already using an optimizing compiler.

If you want to argue that optimizing compilers are dead, you'd have to show that you can remove optimizing compilers from your toolchain, and have nobody notice the difference. That's a pretty hard argument to make.

  • This is where my mind keeps coming back too. I keep hearing "it doesn't matter" and yet nearly every compiler I come into contact with is an optimizing compiler. Heck, even with some pretty straight up "performant" code that difference between optimizations on and off in a lot of these can be significant.

    C#,C++,C,Go,V8

    Just about everything.. I can only imagine what would happen if every process on all the systems I deal with were re-compiled with optimizations off :|

  • >Is it your position that all of OS X could run on Squeak, or gcc -O0, with performance comparable to what it does now? >Because that seems to be the logical conclusion of your position.

    No, that's not my position, though it is likely true. In fact, in a sense this has already happened: the default optimization switch on OS X is -Os, so optimize for size, not for speed. It turns out that this is usually also OK for raw speed, but it would not matter (much) if it didn't. Memory is more important. With Squeak bytecode being so much more compact, it would probably be an overall win (think cache-pressure for code executed once or twice).

    But let’s go with your assumption that we don’t set optimization flags. Much worse regressions happen all the time and people don't notice or don't care. Think virtualization, running code in browswers, the overhead of Dalvik, the introduction of atomic properties in ObjC, or ARC or Swift. How many people noticed the locale switch a couple of years ago that caused grep to become ~100x slower on OSX? How much of the web runs on PHP and Ruby or similarly comically slow tech? We’re spending Moore’s dividend[1] in lots and lots of places, having optimizations turned off in our compilers would be a small fraction of the total.

    However, the point is not that you can remove optimizations and everything would simply be the same. The point is that optimizing compilers do more than is needed for most code, and not enough for the rest.

    This is not really (or mostly) about what optimizers can or cannot do, it is about the nature of code and the performance requirements of that code. On today’s machines, most of the code isn’t noticeable. Some of the code is highly noticeable, and usually that is in (soft) real time situations. What’s interesting about real time is that speed does not matter. Not missing your deadline does. These are subtly but fundamentally different goals.

    I learned this when I was working on pre-press (90ies), and saw newspapers switching to pre-rendered bitmap workflows, where the pages (or page fragements) are rasterized at high dpi (1270dpi is typical for newsprint) early on and then you work with that in all further process steps. This seemed utterly insane to me, because it is the most inefficient way of working possible. Little 100KB Postscript files turned into 50-100MB TIFFs. However, it absolutely makes sense for a newspaper: once a file is rendered, further processing is completely predictable. Postscript (or PDF) on the other hand is orders of magnitude more efficient in not just the best but also the average cases. However, the worst case is essentially unbounded. For a (daily) newspaper, having the pages earlier than needed is nothing, but not having them later than needed is everything, so once you have sized your workflow to handle the (significantly larger) load, you are golden.

    The same goes for most desktop and mobile apps. If you’re under 100ms response time and under 16.66ms animation time, you’re good. Being faster does not help. Again, predictability is more important than performance: it is infinitely better to hit 15ms 100% of the time than 1ms 90% and 50ms 10% of the time, even though the latter is more than 2x faster.

    Optimizing compilers help with speed, but they do make predicting what code will perform well (or how well) more difficult. You essentially have to have a model of the machine code and access patterns that you are targeting for the platform that you are targeting in your head along with a model of the optimizations your compiler can and does perform, and then write code that will goad your compiler into generating the machine code that you want. Phew! Though admittedly not as tough as doing the same with JITs (I remember the “optimizing for Java Hotspot” session at WWDC back when that was a thing. Much, much easier to just write the damn assembly code yourself!)

    DannyBee pointed out a major reason for this: the compiler is bound by the semantics of the language, and maybe more importantly the code as written. Many times, those constraints are much tighter than necessary. The optimizer can’t do a thing because it is not allowed to touch the semantics. The programmer doesn’t know why the compiler isn’t generating the code it should. Maybe the two should talk?

    There are exceptions to this, for example Google-scale operations where shaving 1% off some computation, though undetectable by users), means you can run about a million fewer servers and save the energy budget of a small country. But again, I would say that that’s an exceptional case.

    [1] Spending Moore’s Dividend, http://cacm.acm.org/magazines/2009/5/24648-spending-moores-d...

"Maybe you should have continued to the end, because as far as I can tell, you are almost completely missing the point. " I if i can't find the point of a presentation in 50 pages, ...

  • The fie is huge because it seems that animations were converted to multiple PDF pages.

    So page count is not a useful metric for presentation progression and information amount.