Comment by kragen

11 years ago

Quoting my comment on this paper from 8 months ago on https://lobste.rs/s/pc1v4g/coq_the_world_s_best_macro_assemb..., where it is full of links to the things I'm referring to:

This is a really interesting and thought-provoking idea. In writing httpdito, and in particular in putting a sort of tiny peephole optimizer into it as gas macros, it occurred to me that (some) Forth systems are awfully close to being macro assemblers, and that really the only reason you’d use a higher-level language rather than a macro assembler is that error handling in macro assemblers, and in Forth, is terrible verging on nonexistent. Dynamically-typed languages give you high assurance that your program won’t crash (writing the Ur-Scheme compiler very rarely required me to debug generated machine code), while strongly-statically-typed languages give you similarly weak assurances with compile-time checks; but it seems clear that the programmer needs to be able to verify that their program is also free of bugs like memory leaks, infinite loops, SQL injection, XSS, and CSRF, or for that matter “billion-laughs”-like denial-of-service vulnerabilities, let alone more prosaic problems like the Flexcoin bankruptcy due to using a non-transactional data store.

Against this background, we find that a huge fraction of our day-to-day software (X11, Firefox, Emacs, Apache, the kernel, and notoriously OpenSSL) is written in languages like C and C++ that lack even these rudimentary memory-safety properties, let alone safety against the trickier bogeymen mentioned above. The benefit of C++ is that it allows us to factor considerations like XSS and loops into finite, checkable modules; the benefit of C is that we can tell pretty much what the machine is going to do. But, as compiler optimizations exploit increasingly recondite properties of the programming language definition, we find ourselves having to program as if the compiler were our ex-wife’s or ex-husband’s divorce lawyer, lest it introduce security bugs into our kernels, as happened with FreeBSD a couple of years back with a function erroneously annotated as noreturn, and as is happening now with bounds checks depending on signed overflow behavior.

We could characterize the first of these two approaches as “performing on the trapeze with a net”: successful execution is pleasant, if unexpected, but we expect that even a failure will not crash our program with a segfault — bankrupt Flexcoin, perhaps, and permit CSRF and XSS, but at least we don’t have to debug a core dump! The second approach involves performing without a net, and so the tricks attempted are necessarily less ambitious; we rely on compiler warnings and errors, careful program inspection, and testing to uncover any defects, with results such as Heartbleed, the Toyota accelerator failure, the THERAC-25 fatalities, the destruction of Ariane 5, and the Mars Pathfinder priority-inversion lockups.

So onto this dismal scene sweep Kennedy, Benton, Jensen, and Dagand, with a novel new approach: rather than merely factoring our loops and HTML-escaping into algorithm templates and type conversion operators, thus giving us the opportunity to get them right once and for all (but no way to tell if we have done so), we can instead program in a language rich enough to express our correctness constraints, our compiler optimizations (§4.1), and the proofs of correctness of those optimizations. Instead of merely packaging up algorithms (which we believe to be correct after careful inspection) into libraries, we can package up correctness criteria and proof tactics into libraries.

This is a really interesting and novel alternative to the with-a-net and the without-a-net approaches described earlier; it goes far beyond previous efforts to prove programs correct; rather than attempting to prove your program correct before you compile it, or looking for bugs in the object code emitted, it attempts, Forth-like, to replace the ecosystem of compilers and interpreters with a set of libraries for your proof assistant — libraries which could eventually allow you to program at as high a level as you wish, but grounded in machine code and machine-checkable proofs. Will it turn out to be practical? I don’t know.

(End quote from lobste.rs comment.)

Also, WHAT IN THE LIVING FUCK IS WRONG WITH YOU PEOPLE THAT THE ONLY THING YOU HAVE TO SAY ABOUT THIS PAPER IS ABOUT WHETHER YOU LIKE COCK. HAVE YOU EVEN READ THE FUCKING PAPER. JESUS FUCKING CHRIST WITH A SHOTGUN IN A SOMBRERO, PEOPLE. FUCK YOU, YOU DROOLING MONSTERS. MAKE YOUR COCK JOKES BUT SAY SOMETHING SUBSTANTIVE.

In some ways, I feel like all we tend to do with our higher-level abstractions is make macroassemblers out of them again. Tiny amount of algorithmic content, then massive amount of business logic. As models for business logic grow more complex, they change from code, to data, and then into code again(this time as a separate language). Provisions from the old environment are lost in the new environment, requiring a second set of abstractions. Repeat...

I would like to point out that things like SQL injection are pretty easy to avoid by just typing(in the type theory sense) things out in a DSL properly, and not having straight string injection. You don't need crazy dependent types to employ the "make sure strings are escaped" strategies.

Infinite loops, on the other hand...

  • I'm pretty sure you can build a total language and outlaw infinite loops even without dependent types. It also turns out that you can basically do whatever you want in a total language with a good runtime so long as it does coinduction well.

    I just think the only real motivation people have taken up on for total languages is their proof theoretic properties which drives you quickly to dependent types.

it occurred to me that (some) Forth systems are awfully close to being macro assemblers, and that really the only reason you’d use a higher-level language rather than a macro assembler is that error handling in macro assemblers, and in Forth, is terrible verging on nonexistent.

Another reason to use a high-level language is that macro assembly code and Forth programs aren't very portable across machines. I guess you could model a virtual machine in the same way the authors model x86, target the virtual machine instead, then have a verified translation from VM operations to x86 instructions, ARM instructions, etc. I wonder if you could get similar performance to a native compiler with this approach.

  • Forth programs are fairly portable across machines — I would guess at least as portable as C — although often not across Forth implementations. That's why Open Firmware drivers are written in Forth.

    Macro assembly code can be portable across machines; that's how e.g. SNOBOL4 was ported to 50 different machines starting in the late 1960s, including even the IBM PC; see https://news.ycombinator.com/item?id=3108946 (a thread here that I posted in, before HN was apparently taken over by drooling cock-obsessed chimpanzees who can't take the time to read EVEN THE ABSTRACT of linked papers, by which I do not mean you, panic) for more details.

    In effect, the coqasm approach moves portability, just like safety, from the language into a library. Normal macro assemblers (and, for that matter, C) make it difficult to tell if your application code has avoided processor dependencies. It seems like Coq might actually make it possible to prove that you've done that.