Advanced Compilers: Self-Guided Online Course

6 years ago (cs.cornell.edu)

I’ve worked on multiple compilers (optimizations expert) at MSFT on VS and CUDA and gave developed a DSL and worked on Database Compilers. I can’t hire compiler people with right skills. We’re building an IDE and those parsers are written differently, and we use Scala packrat parser combinators. These courses teach very little judgement or industry relevant stuff. When do you use packrat parser vs LALR vs LL? Good luck hiring an engineer with advanced compiler courses knowing any of this. I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.

  • > I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.

    I don't teach compiler courses, but I'm an academic, and I do keep in touch with industry people to find out what's important for my data analysis course. A couple of big problems with your proposal:

    1. Time variation in "what industry wants". This year one thing's hot, the next year it's something else. A bare minimum requirement for any course is that a student taking it in their second year should learn something that's still relevant three years later when they're in the first year of their job.

    2. Cross sectional variation. There's no such thing as "what industry wants". Every place wants something different, with only a small subset common to 80% of employers.

    • My two cents - what’s useful for employers is giving students the confidence to jump into the code and figure it out. By this measure, what’s relevant for a compilers course isn’t the material itself. It’s that it’s structured to force the students to figure certain things for themselves.

      There are college hires who need to be spoon fed, and college hires who just need mentorship and direction. Group two are invaluable.

      7 replies →

    • ibains made a specific point about language processing in IDEs being different to that of (traditional) compilers. Presumably there exists a state-of-the-art there just as there's a state-of-the-art for conventional compilers, but academia doesn't give it as much attention.

      21 replies →

    • There is no such variance and new fashions in compilers every year. There are hardware compilers and tools including language servers primarily. Move to cloud requires source to source compilers for data snd transforms.

      1 reply →

  • Hiring for exact skills is what you do when you need the skills ASAP and the actual supply of engineers is there. The time you waste looking for someone with the exact knowledge you need could have been spent getting a good compiler engineer and just teaching them what you know. And magically that good engineer will get better, it's pretty amazing stuff.

    I've was told multiple times, University/School teaches you how to learn. You get your actual knowledge and skills from the job. What I learned in 3 years of university was nothing compared to what I learned in 6 months on the job.

    • At first, it sounds like "win/win" if universities trained their students to use the same tools companies want. Companies would save money for training, and graduates would be ready for their jobs.

      But in longer term, it is actually "win/lose", because if three years later the fashionable tools change, the companies that optimize for saving money on training would simply fire their existing employees and hire new graduates.

      For the students, it is better to be the kind of person who understands the deep principles and can learn new tools as needed.

      And the companies have a choice to either offer job trainings, or offer higher salaries to people currently working in other companies who already have the needed skills. (Or whine about market shortage and import H1B servants.)

    • I will second this. I would much rather hire an engineer who has a related background and wants to learn over an engineer with the exact skillset but does not want to learn. It may take a few months, but the former engineer will pick up what you want them to know (if you mentor them correctly), and past that inversion point, they are immensely more valuable.

  • You are thinking about it the wrong way. In this niche, you will almost never find people with the exact skills you are looking for. Better to find people with tangentially related skills and experience and let them grow into the role. Academia is never going to be comprehensive enough to prepare students for every niche that has some demand. Rather, they should just focus on teaching a core that will help them learn what they need later on demand.

    > When do you use packrat parser vs LALR vs LL?

    If you need ok performance and good error recovery (especially if you are doing stuff in an IDE!), none of the above: recursive descent is not that hard to implement, and is incredibly flexible.

  • Do you think it is the responsibility of academia to teach "industry relevant stuff"? I agree that courses generally don't teach judgement well, but I do think that it is the workplace/industry's responsbility to train their engineers and not expect fresh grads to be fully equipped to work on something that specific.

    • > Do you think it is the responsibility of academia to teach "industry relevant stuff"?

      I don't think that was what ibains was going for, though i don't fault you for seeing this in that comment. (Especially that i don't think this course suffers from that problem and generally i think things are rapidly improving on this front.)

      That problem to me and i think to him too is that quite a lot of things that such courses tend to teach as well thought out, well working solutions and approaches just don't work very well and frequently i find comments on what's a 'good taste' solution and what isn't that are completely my understanding of the problem space which is why.

      E.g. in parsing (as that's the topic he mentioned): First, lots of people just spend way too much time on it. And they focus on parts that are of zero use to beginners (like explaining all several grammar families) and then use obtuse parser generators that save no work and sometimes use them in bizarre way (like picking a lalr parser generator then hand editing the output to support a not-quite-lalr language). Meanwhile a recursive descent parser is easy to write, fast and gives pretty good error messages with _very_ little work. You do need to know enough about grammars to be able to write one down and know if it describes an ambiguous language etc so this should be taught, but you don't need to understand the language zoo well.

      17 replies →

    • > Do you think it is the responsibility of academia to teach "industry relevant stuff"?

      In college I learned a lot of math and how to do things like calculate stress and bending. I learned nothing about material finishes, material selection, fastener selection, tolerances, drafting, etc. But the latter was easy to pick up on the job, while learning mathematical analysis on the job just doesn't happen.

      1 reply →

    • I definitely think there ought to be courses on such things available. There should be an educational route that is between pure academia which focusses on esoterica that is only relevant to research and vocational courses that don't include much innthe way of theory at all. The idea that industry ought to pay for it presumably comes from a background where universities are private an expensive. But I'd like to live in a world where these kind of courses were government subsidised and available for free or cheap.

      2 replies →

    • > Do you think it is the responsibility of academia to teach "industry relevant stuff"?

      I mean, there are colleges that cater specifically to industry. Digipen is a good example of this: they have very close relationships with game studios and shoehorn their curriculum into the requirements the state imposes on colleges.

      These sorts of colleges are not particularly popular, however, so I think most prospective students understand the value of a more broad and timeless approach to software education.

  • I understand the problem.

    I teach compilers at a big university. And I would love to hire graduates with compiler skills for my startup, but find it difficult.

    There are several structural problems that conspire to keep students from acquiring knowledge in low-level programming domain like compilation: a course needs to fit with the rest of the curriculum, and with student interest. I cannot get students interested in lexing and parsing, because they hear all day how exciting deep learning and big data are. Lexing and parsing are deemed solved in the 1960s, a sentiment I disagree with. In addition, classical theoretical computer science (e.g. automata and languages) is rarely taught with enough depths in many universities, so students lack the technical background. You can cover this in a compilers course (I do) but it takes time, and the students see it as an ordeal. Compilers as a subject is not helped by the most widely recommended book being the Dragon Book which is dated and way too long to be useful for beginners. Compare the Dragon Book with the material available as introduction to machine learning ... Many of the existing textbooks are also not covering modern compilation themes, in particular JITs, and compilation for multi-core and GPUs. I'd say there is currently no good undergraduate textbook on compilers. I could probably, with a lot of work, cobble something reasonable together from my lecture notes, but I don't believe in books, I'd like to do a good MOOC, but building suitable software infrastructure requires more work than I am currently willing to put in.

    My department tried hard to remove the compilers course, as "no longer relevant", and "too hard". I threatened to resign if it was not kept as a mandatory course. For now this worked.

    • I never took a compilers class during university study and it is my greatest regret. I've since been self learning and it hasn't been easy.

      Thanks for advocating for the importance of compilers. This lack of knowledge has been a limiter for me many times. I'd really like to fill a skill gap and be able to write a DSL or a little language to solve a problem.

      1 reply →

    • I never understood this. I took the compilers class at CMU, and I loved it. I went on to take a follow-up class implementing a compiler with higher-order types. Meanwhile, most of my classmates avoided compilers altogether.

      I've worked in the industry since undergrad, and I've implemented several interpreters for DSLs. Sometimes I was asked to make it, but other times I had the flexibility to dream up a DSL on my own, and it ended up becoming indispensable to users. I've always loved building them, but to this day, parsing is still the most boring part to me.

      I recently went through Bob Nystrom's Crafting Interpreters, and it was great fun. I implemented an interpreter in Rust. Material about compiling for GPUs would be interesting. But for me, personally, I really wish there were something updated for how IDEs use languages nowadays, in particular, walking through a good way to implement the language server protocol.

      The calculator language that every compiler book starts with should be LSP-first.

      A MOOC would be great, but I'd also be happy with a series of blog posts.

      Does anyone know of good material bridging the gap between traditional compiler architecture and the LSP?

    • > Compilers as a subject is not helped by the most widely recommended book being the Dragon Book

      Is it? Even I, an outsider [to the modern university courses], see that the baseline quite shifted to the TigerBook (aka Appel - Modern Compiler Implementation in [your choice]).

    • Is it at all possible that traditionally compilation is taught back to front? I am just a hobbyist, my CS degree did not include anything about compilation other than the theory of types of Grammar. I am the first to put my hand up and say I do not really know what I am talking about.

      For me, compilers are interesting because of a. optimisations; at my level simple logic that is obviously advantageous to a CFG. b. interaction with the dependencies and operation of low level hardware; how does it REALLY work.

      In the hobbyist pursuit of an understanding I have become belligerent to parsing. Parsing is interesting. But also not really. I would have benefited substantially from an abstract introduction via graphs, analysis and lowering. Both in terms of interest and fundamental understanding.

    • Thanks. I'm actually very interested in lexing and parsing, because that's probably 99% of the stuff a layman-programmer gets to do with compiler theory in job. I mean not everyone gets chance to write the backend part, but parsing skill is almost used universally.

      2 replies →

    • Your HN profile is empty, so there seems to be no good way to check out your institution's program or to contact you.

  • No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.

    As this is a PhD course, I'd expect the goal of it to be preparing students for research in the field, not writing parsers for some industrial company.

    Also, there are definitely courses that teach you how to write a parser. But we usually don't call them advanced.

    • > No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.

      Not if you're writing an IDE. Writing a batch mode parser that takes an entire correct source file and produces an AST is easy. Writing a parser that instaneously handle producing an updating AST while the user is typing in the middle of the program while still allowing semantic analysis to occur and without vomiting out a pile of extraneous errors or giving up entirely is a lot harder.

      5 replies →

  • I saw this great interview recently with Anders Hejlsberg at MSFT on how modern compiler construction differs from what is taught in traditional University courses. Is this what you're alluding to?

    After watching that interview, it's strange to read a comment like yours. While the architecture is completely different, it doesn't frankly seem like that big a leap to go from "senior engineer with X years working on thing that's tangentially related to compilers" to being able to be productive in a reasonable amount of time working with the new architecture. What am I missing?

    https://youtu.be/wSdV1M7n4gQ

    • I will say that I asked Shriram Krishnamurti about why books on interpreters and compilers tend to avoid the question of parsing like the plague and found his response a little unsatisfactory.

      All these celebrated texts like SICP, EOPL, and Lisp in Small Pieces avoid the question of parsing altogether by focusing on s-expression based languages.

      His response was effectively that parsing is over-emphasized in other books, and it's truly orthogonal to other PL activities. Which I can understand, but in practice when I need a parser in my day job I usually don't have much say in the structure of the input "language" (since I'm usually parsing something coming from another source that I have little control over). And it would seem if you have an instructor in a compilers course with this point of view, what other class would give you an opportunity to learn parsing in a formal setting?

      3 replies →

    • It's pretty hard to pick up compiler skills because very little of it is written down. It takes a lot of time (a few years) working with code bases and papers to absorb it.

      3 replies →

    • Yes, more validation - compilers as taught in universities need improvement.

      So, a startup of 10 people should hire and train people for months. Raise a $2M seed round, have 12-18 months to show ProductMarket fit and as a founder teach all engineers their jobs? How many engineers go into compilers after school? So university education is all they know. The industry is built on qualified engineers. NVIDIA ceo was ahead of curve on graphics card, google founders ahead in search, why are compilers ones behind the curve?

      1 reply →

    • Huh, he's talking a lot about how doing everything incrementally is infeasible; isn't this how Rust's compiler/tooling infrastructure works?

  • Parsing is frontend of a compiler. That the compiler course focuses on compiler backend does not mean it's not relevant to the industry. I personally find aliasing, flow analysis, and and optimization very relevant to what we needed to do.

    As for industry vs academia, wouldn't it be fair to say that professors should teach which ever is more advanced? I certainly don't believe that teaching Java generics, no matter how relevant that is, is relevant to a PhD program, which is supposed to push the state of the art of a subject.

  • > Good luck hiring an engineer with advanced compiler courses knowing any of this.

    Have you tried raising the level of compensation you offer? There are engineers elsewhere doing this at other companies you could presumably get if you put your hands in your pockets?

  • College is for imparting foundational knowledge. It isn't vocational training. You should find capable employees and train them for your needs.

  • > These courses teach very little judgement or industry relevant stuff.

    Hasn't that been the debate for, like, ever?

    Should the universities teach you how to learn or should they teach you how to hit the ground running at your first post-graduation job?

    I'd say anyone who took phd-level courses on compiler design shouldn't take too much training to become a valuable team member as they've proven they can learn what is important for the given task but maybe that's just my uninformed opinion.

    • I feel like "teach you to learn" vs "teach you useful skills"is a false dichotomy, and often just an excuse for poor teaching. They can and should do both. The reasom they don't is because they're so research focussed, which is fine as an option but a problem because there are no or vert few good options for high level academic courses that are more practically focussed.

  • > I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.

    Courses should be used to teach general concepts while industry should be used to show those how to apply those concepts. The choice between using a packrat parser, LALR, or LL would be easier to explain to someone who has been exposed to those concepts and understands the advantages and disadvantages of each approach compared to someone who has never heard of the terms.

    Best practices change with time, but knowing the base concepts and how to apply them will help people adapt. If they're only exposed to what's considered relevant at this time, then they won't have the necessary exposure to concepts that may pertain to future best practice.

  • > I can’t hire compiler people with right skills.

    You can hire bright, motivated and mature people and teach them the skills they need. Academia doesn't exist to provide profit-maximizing, ready inputs to business. They produce people who have the skills to (hopefully) be able to grow into any position.

  • I have some idea what I'm talking about, see my post https://news.ycombinator.com/item?id=25210523 and I don't kniow about why using a packrat over anything else. I pull something off the shelf, antlr for my latest project, then get on with it.

    The parser is the most trivial side of things, if you consider that a key skill it comes across as being an issue from the 1970s, not today. Writing a RD by hand is not a big deal anyway as someone else said.

    If your into optimising compilers you're focusing on a very odd area to criticise. I'd gave though more about dataflow using lattices or whatever, graph stuff etc.

    So parsing aside, what are the 'relevant' skills you feel are missing?

    (and what are 'Database Compilers'? Nearest I can think of that may fit are query optimisers)

  • As someone who is happy when he interviews a candidate who at least knows you don't start parsing with `string.split()` I gotta say, your bar is extremely high. Surely MSFT can afford some internal training?

  • > I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.

    I'm sorry if you've already answered, or if it comes up a lot, but could you toss up some buzzwords/more industry-standard things you're looking for? I was going to look into this course, and still probably will, but if you're saying it's not relevant I'd like to know what to be looking into instead?

  • Parser combinators allow elegant implementations of parsing but are ridiculously slow compared to some Knuth-optimized parsers.

    • As elegant as parser combinators are, it is not so easy to work out exactly what grammar they are implementing.

  • I am currently writing a parser using combinator. How can I reach you to learn what is the proper method to use on each case?

  • perhaps you should guest lecture at a university and have them record all the lessons? your kind of experience is rare.

  • I second this (although I work more on translation and modeling than optimization). IME, it's much easier to find otherwise competent, savvy researchers and spin them onto compiler work than it is to take someone who's gone through an "advanced" course.

  • Is that such a critical hiring criteria? That seems like something I could learn in a week.

    • > That seems like something I could learn in a week.

      So why not learn it in a week then apply for the job?

      I would suggest the reason is... because you cannot learn it in a week.

      3 replies →

  • > When do you use packrat parser vs LALR vs LL? Good luck hiring an engineer with advanced compiler courses knowing any of this

    Honestly, I have no idea why you would want any of those. Can you explain in a few sentences?

    I written a parser before and it was faster than everything I used that does the same thing or similar. And I haven't had any professional or teachers teach me parsing

    I once wrote a json parser in .NET. It was faster than literally everything I tried but I already know it can't touch simdjson and I haven't compared it to the new dotnet parser but I imagine they do it faster. I'm actually confused how people write such shitty parsers when I barely know what I'm doing. Like are they using a link list or revisiting characters multiple times to parse? (I think two visits is acceptable. One to find the end of a number and another pass to convert the number to value although I'm sure it's not hard to do one pass)

  • This is such an empty criticism. Universities are not job training facilities. You want people who have the exactly right skillset? Hire smart kids and train them, you know, the way it's been done for hundreds of years.

  • I agree in general. I have a PhD in comp eng and know a lot of academia type subjects, but only a small portion is especially useful in a practical setting. Also a lot of academic knowledge seems needlessly obtuse. It becomes a lot clearer when looking at the practical problems that spurred the academic fields. At times I get the aha as to why this abstract concept is addressing a meaningful question, and if that's how academic education oriented itself, around these basic meaningful questions it would help people learn much more effectively.

  • > When do you use packrat parser vs LALR vs LL?

    This is not something you can expects candidates to know.

Domain-specific compilers are a bigger field, where there's more opportunity to do something interesting

Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a domain-specific program into C

For example, NN-512 (https://NN-512.com) is a compiler that takes as input a simple text description of a convolutional neural net inference graph and produces as output a stand-alone C implementation of that graph

GNU Bison (https://www.gnu.org/software/bison/) is another example, where the output is a parser: Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser (either in C, C++, or Java) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar

Halide (https://halide-lang.org/) is a language for fast, portable computation on images and tensors, and it originally generated human-readable C++ code as output (but not anymore, now it feeds directly into LLVM)

Can anyone provide other examples? (Code generators for network packet processing?)

  • > Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a program into C.

    Don't you still need many of the techniques taught in this class to build something like that? The material doesn't look inherently targeted at low-level languages or machine-code as a target.

    • One way to think of it is to split the "compiler" problem into three big pieces. 1) Front-end, 2) back-end, 3) tooling.

      Front end is lexing, parsing, building the AST (in whole or just keeping pieces of it around), semantic analysis. When that is done, you can say "Yup, valid program. It is possible to generate code." So, to your question, yes. Those techniques from the course are 100% in play.

      Back-end is turning the AST into something you can optimized and generate code from. (Sometimes there is a "middle-end" in there....) High-level optimizations, low level optimizations, memory allocation, register allocation.

      Tooling is everything that keeps your life from being miserable as a user -- starting with debug symbols and these days people expect library and package management, etc, etc.

      So if you are interested in exploring the Next Great Syntax or some semantic problems that can be re-written into C, then doing a C-front is a great way to do that. Let the C compiler handle all the really-really-really-hard back-end issues for you, and avoid all that distraction. But.... expect that getting debug symbols from your spiffy language into the ready-to-link object file is going to be, as they say, "non-trivial". So... great for experiments and a great learning exercise, but it's hard to do a production compiler that way.

      1 reply →

  • I have just spent the last 13 months significantly reworking a compiler that turns Excel-like formulas into JavaScript. The compiler is used by an app designer, and the formula language is meant to be easy to pick up for people used to spreadsheets. A significant difference compared to Excel is that our compiler features static typing, enabling helpful error messages to be produced at build-time.

    (Spreadsheets, on the other hand, tend to try very hard to make sense of types at runtime, using type coercion to turn values of the wrong type into something of the expected type they can work with. That produces surprising results and confuses users. =IF(2, 3, 4) will generally produce 3, because 2 -- which should be a logical/boolean value -- is interpreted as TRUE. We think it's a better idea to warn the user that the first parameter is wrong, and to do that as soon as possible.)

    By far the largest challenge with reworking the compiler has been the type system. I have expanded our type system from the four standard spreadsheet types to encompass more than 80, as well as added support for lambdas (for use with functions like SUMIF and FILTER), type inference, parameterized types and union types. The biggest addition is probably an imperative mode, that executes formulas with side-effects in response to events being fired. This is all in service of features we have wanted to add for years, that have been held back by our (previously) simplistic type system.

    I did take a compiler class at university, more than 15 years ago, but while that proved very useful when writing the formula parser (an operator-precedence parser, initially straight out of a textbook), type theory was not part of the class. I had to pick up bits and pieces by reading the Java Language Specification (JLS) and by experimenting.

    The Cornell class doesn't seem to feature type theory either, which I think is unfortunate. Now that the pendulum has swung, yet again, firmly in the direction of static typing, perhaps static typing will become a bigger part of CS compiler curriculums.

    • Your project sounds like fun. What/for whom is the project? What level of Excel formula compatibility are you shooting for, e.g. just in the neighborhood like Google Sheets, reasonably close as in Libre Office, or full compatibility with the latest version? If the latter you must have one gnarly test suite. Would love to hear more about it.

      4 replies →

  • AnyDSL (https://anydsl.github.io/) might be worth mentioning, it is a framework for creating domain specific languages using partial evaluation to give powerful abstractions to the user while still producing high-performance code for different target platforms. On the website, you can find that they already have DSLs for ray tracing and image stencil codes as well as works for bioinformatics.

  • Not open source but I'm working on a system that turns a DSL (embedded into Jupyter notebooks) into either c or c with opencl for a niche finance application. Not a huge project (<10k lines of code). Nature of the problem means that DSL can be very simple (it's aimed at non-programmers). This approach also means that could easily add other targets - e.g. cuda or wasm.

A fantastic submission, very much appreciated.

I find the presentation style of subtitles with surrounding context extremely helpful. I often lose the ability to listen when concentrating; the subtitle context provides a superior recovery mechanism to the usual, try to work out what was missed. I look forward to purchasing augmented reality glasses at some point in the future which add this advantage to in-person interactions.

  • Yes!

    It’s the complete opposite of, and significantly more user friendly than, how zoom recordings synchronize meeting subtitles with audio/video playback.

    In their case they only show the current line in a text message-like sidebar. As the content plays, it snaps the scroll position of the messages to current, hiding the rest. This breaks scrolling ahead or back while listening and you lose any context.

    Hopefully I’m just missing a setting - I try to minimize the time I spend in there. Is there a way I can take their damn horse blinders off?

  • I too appreciate the subtitles and I tend to have them on when I watch+listen to a DVD. You can even click on the sentence and the video jumps to that point. However ... a little proof-reading would help here. To get things started gets transcribed into Ticketing, sir.

I love this line...

“Real 6120 also has an end-of-semester course project—in the self-guided version, your end-of-semester assignment is to change the world through the magic of compilers”

I wish they’d be more courses on compilers that also serve as language servers. The current world is moving towards a phase where a language needs IDE tooling such as compilers, linters, formatters, REPL, language servers, debuggers and profilers.

Would love to see a high level course that covers how to build the tooling for a great developer experience.

  • As i mentioned in another thread, a uni course that has students work on an existing codebase instead of writing something from scratch might be even better - though harder to grade.

    Imagine if one could do a major feature contribution to clangd/pyls/rust-analyzer as part of their advanced compilers course. The student gets a better understanding of (open-source) software development practices, gains a deeper proficiency in the language (not just the APIs), the project brings new contributors and users (including the student herself) benefit from the new feature.

    Double bubble!

    • My uni, NC State, has a course (CSC 326) called Software Engineering that has students contribute to an existing project created by the staff in teams, with the best contributions being merged into master for the next semester students to work on.

      3 replies →

Anyone interested in following the course together during the holidays? To keep each other motivated etc.

  • If you have to stop "early" please get through the simple DCE and LVN passes.

    A solid basic-block level DCE & LVN optimization along with a judicious "compile time evaluator" (simple constant folding) will let you produce code at around 80% of the level of LLVM.

    Obviously, you're not going to get the magic that comes of LLVM's scalar analysis pass, but you're not going to be embarrassed with your performance, either.

  • I'm interested. I'm writing a compiler-interpreter now but my broad compiler-construction chops are spotty and going through this systematically might be useful.

  • I would love to do so! I'm a masters student who has been wishing there was an advanced compilers course in my program, there's only the undergrad level one, so this is perfect! Let me know how I can get in contact.

  • I’m interested. I may not have the most relevant background, though — I’m a data scientist with an econometrics education. The topic really does interest me, though.

  • I'm game. My background is in math and not CS, so I can't say I'm certain I have the prerequisite knowledge.

On the related note, what would you guys recommend as an introduction to compilers, say for beginners?

  • Crafting Interpreters has been great. It starts with an AST-walking interpreter (in Java), so you get to focus on parsing and the simplest form of interpretation, and then it moves to a bytecode interpreter (in C) so you get to learn about compiling to a more primitive target via a lower-level language (much of which should translate to a "real" compiler). Very approachable, has lots of side-context and an entertaining style. The language you interpret was custom-designed for the course, which helps a lot, and resembles a simplified JavaScript without all the messy parts.

  • Engineering a compiler is by far the best introductory book that is also modern.

    Advanced compiling for high performance is one of not many good accounts of scheduling for non-toy architectures.

    Muchnick's book is a good encyclopedia of optimisations although the code is unreadable and buggy.

  • Compilers books: I'd start with "Engineering a Compiler" by Keith Cooper and Linda Torczon. http://craftinginterpreters.com/ is also a pretty great, programming-oriented intro, which may be good to work through alongside. For more on the analysis & compiler optimization side, "SSA-based Compiler Design" (http://ssabook.gforge.inria.fr/latest/; GitHub Mirror: https://github.com/pfalcon/ssabook) is a good follow-up.

    Further readings: Book recommendations in https://github.com/MattPD/cpplinks/blob/master/compilers.md#... as well as program analysis resources (in particular lattice theory, type systems and programming languages theory, related notation): https://gist.github.com/MattPD/00573ee14bf85ccac6bed3c0678dd...

    Courses: I can recommend the following: https://github.com/MattPD/cpplinks/blob/master/compilers.md#...

    Particularly (in alphabetical order--I think these are all great, so including highlights of what I've liked about them):

    - IU P423/P523: Compilers (Programming Language Implementation) - Jeremy Siek, with the course book "Essentials of Compilation: An Incremental Approach" (pretty interesting approach, with programming language features developed incrementally having a fully working compiler at each step, cf. http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf; implementation language Racket),

    - KAIST CS420: Compiler Design - Jeehoon Kang (good modern treatment of SSA representation itself, including the use of block arguments, https://mlir.llvm.org/docs/Rationale/Rationale/#block-argume..., as well as SSA-based analysis and optimization; Rust as an implementation language),

    - UCSD CSE 131: Compiler Construction - Joseph Gibbs Politz, Ranjit Jhala (great lecturers, both Haskell and OCaml edition were interesting; fun extra: one of the Fall 2019 lectures (11/26) has an interesting discussion of the trade-offs between traditional OOP and FP compiler implementation),

    - UCSD CSE 231: Advanced Compiler Design - Sorin Lerner (after UCSD CSE 131: for more on analysis & optimization--data flow analysis, lattice theory, SSA, optimization; fun extra: the final Winter 2018 lecture highlighted one of my favorite papers, https://pldi15.sigplan.org/details/pldi2015-papers/31/Provab...),

    - UW CSE CSEP 501: Compilers - Hal Perkins (nice balanced introduction, including x86-64 assembly code generation, with the aforementioned "Engineering a Compiler" used as the course textbook).

    • Thank you so much for this comment. It's incredibly difficult to find modern compiler learning resources – when I ask, I often get disparaging comments about the dragon book and not much more.

      Please, consider putting this list on your blog, or somewhere more visible than a HN comment.

      1 reply →

  • I've been reading "Modern Compiler Design" currently, not too sure how beginner friendly it is though.

    Also, haven't made it too far into the book so can't really give a recommendation either way.

Even if you don't do the course, the first paper they list is a fascinating must-read:

Producing Wrong Data Without Doing Anything Obviously Wrong! Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. ASPLOS 2009.

https://dl.acm.org/doi/10.1145/2528521.1508275

I happened to be at that ASPLOS and I thought it was fascinating. Good to see it recognized as one of the most important papers in compilers.

I took the undergraduate version of this course while at Cornell. It was a wonderful course. I taught me a lot how computers work, and how to organize a large (for me at the time) software project.

One of the most painful parts of the the project was using Scala which had ridiculously long compile times which made iterating quite a pain.

My favorite memory was that for one project we had to submit a program written meant to stress test everyone's parser.

The course materials look good.

But what is a "PhD level course"? All I've seen is undergrad and graduate, and the difference between masters and PhD has nothing to do with classes.

  • Cornell student here that's taken a bunch of classes at all of the levels - in my experience, undergraduate classes/master's classes are kind of taught with the expectation that you're taking the course as a requirement and are much more evaluation heavy, and that you're likely taking a full course load of other courses as well. PhD level courses are generally assumed to be the only course that you're taking that semester and that you're taking it more because the material will be useful for your research rather than for the credential, so there's much less emphasis on evaluation; i.e. a smaller quantity homeworks that each much harder than an undergraduate version of the same class, and final evaluations are much closer to reading recent research papers and understanding them rather than a timed exam.

  • > the difference between masters and PhD has nothing to do with classes.

    huh ? what's your education system ? in europe it's generally 3 year bachelor, then 2 year masters, then 3 years phd so when you have classes during your phd hopefully they are different that the one you get during your masters

    • Depends on the field. I studied philosophy (Ph.D. program, but finished with an M.A.), and the M.A. is often just a degree given after the first few years of the Ph.D. There are a handful of required first year courses (logic, basic metaphysics, philosophy of science and ethics), but otherwise students take whatever courses are relevant to their interests. It helps that philosophy typically has fewer classes with explicit prerequisites. If you take an advanced course, you might be lost, but it doesn't feel the same as not being knowing a formula or how to derive something.

      Some courses are more basic, and some are more like collaborative research seminars. Advanced students gravitate towards the latter (and typically take fewer courses and/or audit more of them), but a first year with the right background can still take them. Conversely, an advanced student might take an introductory course in an area they lack background in to broaden their knowledge.

      There also are terminal M.A. programs, but they're less common, and the majority of students who do Ph.Ds at prestigious institutions don't do one.

    • I am in the US, and at my school masters and PhD students all take the same classes.

      It is not ideal because many of the masters students are coming from a non-CS background. Some of the graduate level CS courses I have taken were easier than 4th year level undergrad classes.

      3 replies →

    • I honestly have never figured out European education systems. Sorry!

      But this is Cornell, which I assume is like other research-oriented US universities. You have a 4 (or maybe 5) year undergraduate, leading to a Bachelors and then apply to graduate programs. The graduate schools may have specific Masters tracks, but not really any general Masters program. Instead, you spend the first year or two taking graduate classes until you either satisfy the assorted requirements or pass an oral qualification (They still have those, right?), after which you focus entirely on research, publications, and your dissertation defense. If you leave school before defending, you get the Masters as a sort of consolation prize---most US PhDs in my experience don't have Masters.

    • In the US, it's typically a 4 year undergrad, 2 year masters, and a ~5-7 year PhD depending on the program itself. From what I understand, while there are some classes you take as a PhD student, a majority of your work is doing research

      3 replies →

  • There is no formal difference between masters and PhD courses. I think he calls it a PhD level course because it is "research oriented" and requires work that perhaps a Masters student looking to complete credits might not be interested in.

    The evaluation criteria is also fairly subjective and rough, which might not work for a larger masters level course.

Well, I don't feel like cleaning the basement or finishing my yard work before the snow, so I might just do this course over the holiday instead.

They should renumber it CS 6502.

  • Funny you mention 6502, because as wonderful as its assembly is, its not that friendly for higher level languages. I tried writing a basic C like language for it with 16 pointer support (see LDA ($ZP),Y), and came to the haunting realization that function calls require pushing the de-reference of zero page pointers to the stack which change their de-referencing address space from 0-255 (eg. main) to 256-512 for subsequent stack frames.

    I then realized that at a speed of 1MHz, that the pointer de-referencing, the stack frame magic, and endless copies to just parse basic expressions just made 6502 a pointless target for a C like language.

    It was fun though.

This course is asynchronously delivered.

Question for students: how do you find synchronous courses with some discussions during class, vs asynchronous courses where discussions are limited to an online written forum or office hours?

  • Asynchronous discussions are durable as content (in the internet). They are also very useful for working students. But they need focus to retain them in memory.

  • Not a problem at all. For non-introductory courses, this works well.

    But needs to have at least some sort of synchronous at some point. In my one class, we use Notion for discussing so it has a live sense of discussion.

  • Listening to a Russian theoretical physicist mumble about how trivial various results are over a zoom call really makes my day, put it that way.i

I'm interested by the topic but I'm a junior backend/data eng, do you think it could be helpful for my profile to dig into the concepts of compilers other than for the culture itself?

What would be some cool use cases to design your own compiler?

Nice, although I'd have wanted a segment on compiler testing.

Oh, compiler construction :) I really enjoyed that lecture during my M.Sc.

"parallelization, just-in-time compilation, and garbage collection" sound fun, too. Especially JIT.