I’ve worked on multiple compilers (optimizations expert) at MSFT on VS and CUDA and gave developed a DSL and worked on Database Compilers.
I can’t hire compiler people with right skills.
We’re building an IDE and those parsers are written differently, and we use Scala packrat parser combinators.
These courses teach very little judgement or industry relevant stuff. When do you use packrat parser vs LALR vs LL? Good luck hiring an engineer with advanced compiler courses knowing any of this.
I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
I don't teach compiler courses, but I'm an academic, and I do keep in touch with industry people to find out what's important for my data analysis course. A couple of big problems with your proposal:
1. Time variation in "what industry wants". This year one thing's hot, the next year it's something else. A bare minimum requirement for any course is that a student taking it in their second year should learn something that's still relevant three years later when they're in the first year of their job.
2. Cross sectional variation. There's no such thing as "what industry wants". Every place wants something different, with only a small subset common to 80% of employers.
My two cents - what’s useful for employers is giving students the confidence to jump into the code and figure it out. By this measure, what’s relevant for a compilers course isn’t the material itself. It’s that it’s structured to force the students to figure certain things for themselves.
There are college hires who need to be spoon fed, and college hires who just need mentorship and direction. Group two are invaluable.
ibains made a specific point about language processing in IDEs being different to that of (traditional) compilers. Presumably there exists a state-of-the-art there just as there's a state-of-the-art for conventional compilers, but academia doesn't give it as much attention.
There is no such variance and new fashions in compilers every year. There are hardware compilers and tools including language servers primarily. Move to cloud requires source to source compilers for data snd transforms.
Hiring for exact skills is what you do when you need the skills ASAP and the actual supply of engineers is there. The time you waste looking for someone with the exact knowledge you need could have been spent getting a good compiler engineer and just teaching them what you know. And magically that good engineer will get better, it's pretty amazing stuff.
I've was told multiple times, University/School teaches you how to learn. You get your actual knowledge and skills from the job. What I learned in 3 years of university was nothing compared to what I learned in 6 months on the job.
At first, it sounds like "win/win" if universities trained their students to use the same tools companies want. Companies would save money for training, and graduates would be ready for their jobs.
But in longer term, it is actually "win/lose", because if three years later the fashionable tools change, the companies that optimize for saving money on training would simply fire their existing employees and hire new graduates.
For the students, it is better to be the kind of person who understands the deep principles and can learn new tools as needed.
And the companies have a choice to either offer job trainings, or offer higher salaries to people currently working in other companies who already have the needed skills. (Or whine about market shortage and import H1B servants.)
I will second this. I would much rather hire an engineer who has a related background and wants to learn over an engineer with the exact skillset but does not want to learn. It may take a few months, but the former engineer will pick up what you want them to know (if you mentor them correctly), and past that inversion point, they are immensely more valuable.
You are thinking about it the wrong way. In this niche, you will almost never find people with the exact skills you are looking for. Better to find people with tangentially related skills and experience and let them grow into the role. Academia is never going to be comprehensive enough to prepare students for every niche that has some demand. Rather, they should just focus on teaching a core that will help them learn what they need later on demand.
> When do you use packrat parser vs LALR vs LL?
If you need ok performance and good error recovery (especially if you are doing stuff in an IDE!), none of the above: recursive descent is not that hard to implement, and is incredibly flexible.
Do you think it is the responsibility of academia to teach "industry relevant stuff"? I agree that courses generally don't teach judgement well, but I do think that it is the workplace/industry's responsbility to train their engineers and not expect fresh grads to be fully equipped to work on something that specific.
> Do you think it is the responsibility of academia to teach "industry relevant stuff"?
I don't think that was what ibains was going for, though i don't fault you for seeing this in that comment. (Especially that i don't think this course suffers from that problem and generally i think things are rapidly improving on this front.)
That problem to me and i think to him too is that quite a lot of things that such courses tend to teach as well thought out, well working solutions and approaches just don't work very well and frequently i find comments on what's a 'good taste' solution and what isn't that are completely my understanding of the problem space which is why.
E.g. in parsing (as that's the topic he mentioned): First, lots of people just spend way too much time on it. And they focus on parts that are of zero use to beginners (like explaining all several grammar families) and then use obtuse parser generators that save no work and sometimes use them in bizarre way (like picking a lalr parser generator then hand editing the output to support a not-quite-lalr language). Meanwhile a recursive descent parser is easy to write, fast and gives pretty good error messages with _very_ little work. You do need to know enough about grammars to be able to write one down and know if it describes an ambiguous language etc so this should be taught, but you don't need to understand the language zoo well.
> Do you think it is the responsibility of academia to teach "industry relevant stuff"?
In college I learned a lot of math and how to do things like calculate stress and bending. I learned nothing about material finishes, material selection, fastener selection, tolerances, drafting, etc. But the latter was easy to pick up on the job, while learning mathematical analysis on the job just doesn't happen.
I definitely think there ought to be courses on such things available. There should be an educational route that is between pure academia which focusses on esoterica that is only relevant to research and vocational courses that don't include much innthe way of theory at all. The idea that industry ought to pay for it presumably comes from a background where universities are private an expensive. But I'd like to live in a world where these kind of courses were government subsidised and available for free or cheap.
> Do you think it is the responsibility of academia to teach "industry relevant stuff"?
I mean, there are colleges that cater specifically to industry. Digipen is a good example of this: they have very close relationships with game studios and shoehorn their curriculum into the requirements the state imposes on colleges.
These sorts of colleges are not particularly popular, however, so I think most prospective students understand the value of a more broad and timeless approach to software education.
I teach compilers at a big university.
And I would love to hire graduates with compiler skills for my
startup, but find it difficult.
There are several structural problems that conspire to keep
students from acquiring knowledge in low-level programming domain like
compilation: a course needs to fit with the rest of the curriculum,
and with student interest. I cannot get students interested in lexing
and parsing, because they hear all day how exciting deep learning and
big data are. Lexing and parsing are deemed solved in the 1960s, a
sentiment I disagree with. In addition, classical theoretical
computer science (e.g. automata and languages) is rarely taught with
enough depths in many universities, so students lack the technical
background. You can cover this in a compilers course (I do) but it
takes time, and the students see it as an ordeal. Compilers as a
subject is not helped by the most widely recommended book being the Dragon Book which is dated and way
too long to be useful for beginners. Compare the Dragon Book with
the material available as introduction to machine learning ... Many of the existing textbooks are also not covering modern compilation themes, in particular JITs, and compilation for multi-core and GPUs. I'd say there
is currently no good undergraduate textbook on compilers. I could probably, with a lot of work, cobble something reasonable together from my lecture notes, but I don't believe in books, I'd like to do a good MOOC, but building suitable software infrastructure requires more work than I am currently willing to put in.
My department tried hard to remove the compilers course, as "no
longer relevant", and "too hard". I threatened to resign if it was not
kept as a mandatory course. For now this worked.
I never took a compilers class during university study and it is my greatest regret. I've since been self learning and it hasn't been easy.
Thanks for advocating for the importance of compilers. This lack of knowledge has been a limiter for me many times. I'd really like to fill a skill gap and be able to write a DSL or a little language to solve a problem.
I never understood this. I took the compilers class at CMU, and I loved it. I went on to take a follow-up class implementing a compiler with higher-order types. Meanwhile, most of my classmates avoided compilers altogether.
I've worked in the industry since undergrad, and I've implemented several interpreters for DSLs. Sometimes I was asked to make it, but other times I had the flexibility to dream up a DSL on my own, and it ended up becoming indispensable to users. I've always loved building them, but to this day, parsing is still the most boring part to me.
I recently went through Bob Nystrom's Crafting Interpreters, and it was great fun. I implemented an interpreter in Rust. Material about compiling for GPUs would be interesting. But for me, personally, I really wish there were something updated for how IDEs use languages nowadays, in particular, walking through a good way to implement the language server protocol.
The calculator language that every compiler book starts with should be LSP-first.
A MOOC would be great, but I'd also be happy with a series of blog posts.
Does anyone know of good material bridging the gap between traditional compiler architecture and the LSP?
> Compilers as a subject is not helped by the most widely recommended book being the Dragon Book
Is it? Even I, an outsider [to the modern university courses], see that the baseline quite shifted to the TigerBook (aka Appel - Modern Compiler Implementation in [your choice]).
Is it at all possible that traditionally compilation is taught back to front? I am just a hobbyist, my CS degree did not include anything about compilation other than the theory of types of Grammar. I am the first to put my hand up and say I do not really know what I am talking about.
For me, compilers are interesting because of
a. optimisations; at my level simple logic that is obviously advantageous to a CFG.
b. interaction with the dependencies and operation of low level hardware; how does it REALLY work.
In the hobbyist pursuit of an understanding I have become belligerent to parsing. Parsing is interesting. But also not really. I would have benefited substantially from an abstract introduction via graphs, analysis and lowering. Both in terms of interest and fundamental understanding.
Thanks. I'm actually very interested in lexing and parsing, because that's probably 99% of the stuff a layman-programmer gets to do with compiler theory in job. I mean not everyone gets chance to write the backend part, but parsing skill is almost used universally.
No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.
As this is a PhD course, I'd expect the goal of it to be preparing students for research in the field, not writing parsers for some industrial company.
Also, there are definitely courses that teach you how to write a parser. But we usually don't call them advanced.
> No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.
Not if you're writing an IDE. Writing a batch mode parser that takes an entire correct source file and produces an AST is easy. Writing a parser that instaneously handle producing an updating AST while the user is typing in the middle of the program while still allowing semantic analysis to occur and without vomiting out a pile of extraneous errors or giving up entirely is a lot harder.
I saw this great interview recently with Anders Hejlsberg at MSFT on how modern compiler construction differs from what is taught in traditional University courses. Is this what you're alluding to?
After watching that interview, it's strange to read a comment like yours. While the architecture is completely different, it doesn't frankly seem like that big a leap to go from "senior engineer with X years working on thing that's tangentially related to compilers" to being able to be productive in a reasonable amount of time working with the new architecture. What am I missing?
I will say that I asked Shriram Krishnamurti about why books on interpreters and compilers tend to avoid the question of parsing like the plague and found his response a little unsatisfactory.
All these celebrated texts like SICP, EOPL, and Lisp in Small Pieces avoid the question of parsing altogether by focusing on s-expression based languages.
His response was effectively that parsing is over-emphasized in other books, and it's truly orthogonal to other PL activities. Which I can understand, but in practice when I need a parser in my day job I usually don't have much say in the structure of the input "language" (since I'm usually parsing something coming from another source that I have little control over). And it would seem if you have an instructor in a compilers course with this point of view, what other class would give you an opportunity to learn parsing in a formal setting?
It's pretty hard to pick up compiler skills because very little of it is written down. It takes a lot of time (a few years) working with code bases and papers to absorb it.
Yes, more validation - compilers as taught in universities need improvement.
So, a startup of 10 people should hire and train people for months. Raise a $2M seed round, have 12-18 months to show ProductMarket fit and as a founder teach all engineers their jobs? How many engineers go into compilers after school? So university education is all they know. The industry is built on qualified engineers. NVIDIA ceo was ahead of curve on graphics card, google founders ahead in search, why are compilers ones behind the curve?
Parsing is frontend of a compiler. That the compiler course focuses on compiler backend does not mean it's not relevant to the industry. I personally find aliasing, flow analysis, and and optimization very relevant to what we needed to do.
As for industry vs academia, wouldn't it be fair to say that professors should teach which ever is more advanced? I certainly don't believe that teaching Java generics, no matter how relevant that is, is relevant to a PhD program, which is supposed to push the state of the art of a subject.
> Good luck hiring an engineer with advanced compiler courses knowing any of this.
Have you tried raising the level of compensation you offer? There are engineers elsewhere doing this at other companies you could presumably get if you put your hands in your pockets?
> These courses teach very little judgement or industry relevant stuff.
Hasn't that been the debate for, like, ever?
Should the universities teach you how to learn or should they teach you how to hit the ground running at your first post-graduation job?
I'd say anyone who took phd-level courses on compiler design shouldn't take too much training to become a valuable team member as they've proven they can learn what is important for the given task but maybe that's just my uninformed opinion.
I feel like "teach you to learn" vs "teach you useful skills"is a false dichotomy, and often just an excuse for poor teaching. They can and should do both. The reasom they don't is because they're so research focussed, which is fine as an option but a problem because there are no or vert few good options for high level academic courses that are more practically focussed.
> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
Courses should be used to teach general concepts while industry should be used to show those how to apply those concepts. The choice between using a packrat parser, LALR, or LL would be easier to explain to someone who has been exposed to those concepts and understands the advantages and disadvantages of each approach compared to someone who has never heard of the terms.
Best practices change with time, but knowing the base concepts and how to apply them will help people adapt. If they're only exposed to what's considered relevant at this time, then they won't have the necessary exposure to concepts that may pertain to future best practice.
You can hire bright, motivated and mature people and teach them the skills they need. Academia doesn't exist to provide profit-maximizing, ready inputs to business. They produce people who have the skills to (hopefully) be able to grow into any position.
I have some idea what I'm talking about, see my post https://news.ycombinator.com/item?id=25210523 and I don't kniow about why using a packrat over anything else. I pull something off the shelf, antlr for my latest project, then get on with it.
The parser is the most trivial side of things, if you consider that a key skill it comes across as being an issue from the 1970s, not today. Writing a RD by hand is not a big deal anyway as someone else said.
If your into optimising compilers you're focusing on a very odd area to criticise. I'd gave though more about dataflow using lattices or whatever, graph stuff etc.
So parsing aside, what are the 'relevant' skills you feel are missing?
(and what are 'Database Compilers'? Nearest I can think of that may fit are query optimisers)
As someone who is happy when he interviews a candidate who at least knows you don't start parsing with `string.split()` I gotta say, your bar is extremely high. Surely MSFT can afford some internal training?
> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
I'm sorry if you've already answered, or if it comes up a lot, but could you toss up some buzzwords/more industry-standard things you're looking for? I was going to look into this course, and still probably will, but if you're saying it's not relevant I'd like to know what to be looking into instead?
I second this (although I work more on translation and modeling than optimization). IME, it's much easier to find otherwise competent, savvy researchers and spin them onto compiler work than it is to take someone who's gone through an "advanced" course.
> When do you use packrat parser vs LALR vs LL? Good luck hiring an engineer with advanced compiler courses knowing any of this
Honestly, I have no idea why you would want any of those. Can you explain in a few sentences?
I written a parser before and it was faster than everything I used that does the same thing or similar. And I haven't had any professional or teachers teach me parsing
I once wrote a json parser in .NET. It was faster than literally everything I tried but I already know it can't touch simdjson and I haven't compared it to the new dotnet parser but I imagine they do it faster. I'm actually confused how people write such shitty parsers when I barely know what I'm doing. Like are they using a link list or revisiting characters multiple times to parse? (I think two visits is acceptable. One to find the end of a number and another pass to convert the number to value although I'm sure it's not hard to do one pass)
This is such an empty criticism. Universities are not job training facilities. You want people who have the exactly right skillset? Hire smart kids and train them, you know, the way it's been done for hundreds of years.
I agree in general. I have a PhD in comp eng and know a lot of academia type subjects, but only a small portion is especially useful in a practical setting. Also a lot of academic knowledge seems needlessly obtuse. It becomes a lot clearer when looking at the practical problems that spurred the academic fields. At times I get the aha as to why this abstract concept is addressing a meaningful question, and if that's how academic education oriented itself, around these basic meaningful questions it would help people learn much more effectively.
Domain-specific compilers are a bigger field, where there's more opportunity to do something interesting
Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a domain-specific program into C
For example, NN-512 (https://NN-512.com) is a compiler that takes as input a simple text description of a convolutional neural net inference graph and produces as output a stand-alone C implementation of that graph
GNU Bison (https://www.gnu.org/software/bison/) is another example, where the output is a parser: Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser (either in C, C++, or Java) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar
Halide (https://halide-lang.org/) is a language for fast, portable computation on images and tensors, and it originally generated human-readable C++ code as output (but not anymore, now it feeds directly into LLVM)
Can anyone provide other examples? (Code generators for network packet processing?)
> Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a program into C.
Don't you still need many of the techniques taught in this class to build something like that? The material doesn't look inherently targeted at low-level languages or machine-code as a target.
One way to think of it is to split the "compiler" problem into three big pieces. 1) Front-end, 2) back-end, 3) tooling.
Front end is lexing, parsing, building the AST (in whole or just keeping pieces of it around), semantic analysis. When that is done, you can say "Yup, valid program. It is possible to generate code." So, to your question, yes. Those techniques from the course are 100% in play.
Back-end is turning the AST into something you can optimized and generate code from. (Sometimes there is a "middle-end" in there....) High-level optimizations, low level optimizations, memory allocation, register allocation.
Tooling is everything that keeps your life from being miserable as a user -- starting with debug symbols and these days people expect library and package management, etc, etc.
So if you are interested in exploring the Next Great Syntax or some semantic problems that can be re-written into C, then doing a C-front is a great way to do that. Let the C compiler handle all the really-really-really-hard back-end issues for you, and avoid all that distraction. But.... expect that getting debug symbols from your spiffy language into the ready-to-link object file is going to be, as they say, "non-trivial". So... great for experiments and a great learning exercise, but it's hard to do a production compiler that way.
I have just spent the last 13 months significantly reworking a compiler that turns Excel-like formulas into JavaScript. The compiler is used by an app designer, and the formula language is meant to be easy to pick up for people used to spreadsheets. A significant difference compared to Excel is that our compiler features static typing, enabling helpful error messages to be produced at build-time.
(Spreadsheets, on the other hand, tend to try very hard to make sense of types at runtime, using type coercion to turn values of the wrong type into something of the expected type they can work with. That produces surprising results and confuses users. =IF(2, 3, 4) will generally produce 3, because 2 -- which should be a logical/boolean value -- is interpreted as TRUE. We think it's a better idea to warn the user that the first parameter is wrong, and to do that as soon as possible.)
By far the largest challenge with reworking the compiler has been the type system. I have expanded our type system from the four standard spreadsheet types to encompass more than 80, as well as added support for lambdas (for use with functions like SUMIF and FILTER), type inference, parameterized types and union types. The biggest addition is probably an imperative mode, that executes formulas with side-effects in response to events being fired. This is all in service of features we have wanted to add for years, that have been held back by our (previously) simplistic type system.
I did take a compiler class at university, more than 15 years ago, but while that proved very useful when writing the formula parser (an operator-precedence parser, initially straight out of a textbook), type theory was not part of the class. I had to pick up bits and pieces by reading the Java Language Specification (JLS) and by experimenting.
The Cornell class doesn't seem to feature type theory either, which I think is unfortunate. Now that the pendulum has swung, yet again, firmly in the direction of static typing, perhaps static typing will become a bigger part of CS compiler curriculums.
Your project sounds like fun. What/for whom is the project? What level of Excel formula compatibility are you shooting for, e.g. just in the neighborhood like Google Sheets, reasonably close as in Libre Office, or full compatibility with the latest version? If the latter you must have one gnarly test suite. Would love to hear more about it.
AnyDSL (https://anydsl.github.io/) might be worth mentioning, it is a framework for creating domain specific languages using partial evaluation to give powerful abstractions to the user while still producing high-performance code for different target platforms.
On the website, you can find that they already have DSLs for ray tracing and image stencil codes as well as works for bioinformatics.
Not open source but I'm working on a system that turns a DSL (embedded into Jupyter notebooks) into either c or c with opencl for a niche finance application. Not a huge project (<10k lines of code). Nature of the problem means that DSL can be very simple (it's aimed at non-programmers). This approach also means that could easily add other targets - e.g. cuda or wasm.
I find the presentation style of subtitles with surrounding context extremely helpful. I often lose the ability to listen when concentrating; the subtitle context provides a superior recovery mechanism to the usual, try to work out what was missed. I look forward to purchasing augmented reality glasses at some point in the future which add this advantage to in-person interactions.
It’s the complete opposite of, and significantly more user friendly than, how zoom recordings synchronize meeting subtitles with audio/video playback.
In their case they only show the current line in a text message-like sidebar. As the content plays, it snaps the scroll position of the messages to current, hiding the rest. This breaks scrolling ahead or back while listening and you lose any context.
Hopefully I’m just missing a setting - I try to minimize the time I spend in there. Is there a way I can take their damn horse blinders off?
I too appreciate the subtitles and I tend to have them on when I watch+listen to a DVD. You can even click on the sentence and the video jumps to that point. However ... a little proof-reading would help here. To get things started gets transcribed into Ticketing, sir.
“Real 6120 also has an end-of-semester course project—in the self-guided version, your end-of-semester assignment is to change the world through the magic of compilers”
I wish they’d be more courses on compilers that also serve as language servers. The current world is moving towards a phase where a language needs IDE tooling such as compilers, linters, formatters, REPL, language servers, debuggers and profilers.
Would love to see a high level course that covers how to build the tooling for a great developer experience.
As i mentioned in another thread, a uni course that has students work on an existing codebase instead of writing something from scratch might be even better - though harder to grade.
Imagine if one could do a major feature contribution to clangd/pyls/rust-analyzer as part of their advanced compilers course. The student gets a better understanding of (open-source) software development practices, gains a deeper proficiency in the language (not just the APIs), the project brings new contributors and users (including the student herself) benefit from the new feature.
My uni, NC State, has a course (CSC 326) called Software Engineering that has students contribute to an existing project created by the staff in teams, with the best contributions being merged into master for the next semester students to work on.
If you have to stop "early" please get through the simple DCE and LVN passes.
A solid basic-block level DCE & LVN optimization along with a judicious "compile time evaluator" (simple constant folding) will let you produce code at around 80% of the level of LLVM.
Obviously, you're not going to get the magic that comes of LLVM's scalar analysis pass, but you're not going to be embarrassed with your performance, either.
I'm interested. I'm writing a compiler-interpreter now but my broad compiler-construction chops are spotty and going through this systematically might be useful.
I would love to do so! I'm a masters student who has been wishing there was an advanced compilers course in my program, there's only the undergrad level one, so this is perfect! Let me know how I can get in contact.
I’m interested. I may not have the most relevant background, though — I’m a data scientist with an econometrics education. The topic really does interest me, though.
Okay, I just created this on discord https://discord.gg/RBPmmdWg. Considering it might be more accessible for people. Wondering if I should post this out of the thread.
Crafting Interpreters has been great. It starts with an AST-walking interpreter (in Java), so you get to focus on parsing and the simplest form of interpretation, and then it moves to a bytecode interpreter (in C) so you get to learn about compiling to a more primitive target via a lower-level language (much of which should translate to a "real" compiler). Very approachable, has lots of side-context and an entertaining style. The language you interpret was custom-designed for the course, which helps a lot, and resembles a simplified JavaScript without all the messy parts.
Compilers books: I'd start with "Engineering a Compiler" by Keith Cooper and Linda Torczon. http://craftinginterpreters.com/ is also a pretty great, programming-oriented intro, which may be good to work through alongside. For more on the analysis & compiler optimization side, "SSA-based Compiler Design" (http://ssabook.gforge.inria.fr/latest/; GitHub Mirror: https://github.com/pfalcon/ssabook) is a good follow-up.
Particularly (in alphabetical order--I think these are all great, so including highlights of what I've liked about them):
- IU P423/P523: Compilers (Programming Language Implementation) - Jeremy Siek, with the course book "Essentials of Compilation: An Incremental Approach" (pretty interesting approach, with programming language features developed incrementally having a fully working compiler at each step, cf. http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf; implementation language Racket),
- KAIST CS420: Compiler Design - Jeehoon Kang (good modern treatment of SSA representation itself, including the use of block arguments, https://mlir.llvm.org/docs/Rationale/Rationale/#block-argume..., as well as SSA-based analysis and optimization; Rust as an implementation language),
- UCSD CSE 131: Compiler Construction - Joseph Gibbs Politz, Ranjit Jhala (great lecturers, both Haskell and OCaml edition were interesting; fun extra: one of the Fall 2019 lectures (11/26) has an interesting discussion of the trade-offs between traditional OOP and FP compiler implementation),
- UCSD CSE 231: Advanced Compiler Design - Sorin Lerner (after UCSD CSE 131: for more on analysis & optimization--data flow analysis, lattice theory, SSA, optimization; fun extra: the final Winter 2018 lecture highlighted one of my favorite papers, https://pldi15.sigplan.org/details/pldi2015-papers/31/Provab...),
- UW CSE CSEP 501: Compilers - Hal Perkins (nice balanced introduction, including x86-64 assembly code generation, with the aforementioned "Engineering a Compiler" used as the course textbook).
Thank you so much for this comment. It's incredibly difficult to find modern compiler learning resources – when I ask, I often get disparaging comments about the dragon book and not much more.
Please, consider putting this list on your blog, or somewhere more visible than a HN comment.
I took the undergraduate version of this course while at Cornell. It was a wonderful course. I taught me a lot how computers work, and how to organize a large (for me at the time) software project.
One of the most painful parts of the the project was using Scala which had ridiculously long compile times which made iterating quite a pain.
My favorite memory was that for one project we had to submit a program written meant to stress test everyone's parser.
Cornell student here that's taken a bunch of classes at all of the levels - in my experience, undergraduate classes/master's classes are kind of taught with the expectation that you're taking the course as a requirement and are much more evaluation heavy, and that you're likely taking a full course load of other courses as well. PhD level courses are generally assumed to be the only course that you're taking that semester and that you're taking it more because the material will be useful for your research rather than for the credential, so there's much less emphasis on evaluation; i.e. a smaller quantity homeworks that each much harder than an undergraduate version of the same class, and final evaluations are much closer to reading recent research papers and understanding them rather than a timed exam.
> the difference between masters and PhD has nothing to do with classes.
huh ? what's your education system ? in europe it's generally 3 year bachelor, then 2 year masters, then 3 years phd so when you have classes during your phd hopefully they are different that the one you get during your masters
Depends on the field. I studied philosophy (Ph.D. program, but finished with an M.A.), and the M.A. is often just a degree given after the first few years of the Ph.D. There are a handful of required first year courses (logic, basic metaphysics, philosophy of science and ethics), but otherwise students take whatever courses are relevant to their interests. It helps that philosophy typically has fewer classes with explicit prerequisites. If you take an advanced course, you might be lost, but it doesn't feel the same as not being knowing a formula or how to derive something.
Some courses are more basic, and some are more like collaborative research seminars. Advanced students gravitate towards the latter (and typically take fewer courses and/or audit more of them), but a first year with the right background can still take them. Conversely, an advanced student might take an introductory course in an area they lack background in to broaden their knowledge.
There also are terminal M.A. programs, but they're less common, and the majority of students who do Ph.Ds at prestigious institutions don't do one.
I am in the US, and at my school masters and PhD students all take the same classes.
It is not ideal because many of the masters students are coming from a non-CS background. Some of the graduate level CS courses I have taken were easier than 4th year level undergrad classes.
I honestly have never figured out European education systems. Sorry!
But this is Cornell, which I assume is like other research-oriented US universities. You have a 4 (or maybe 5) year undergraduate, leading to a Bachelors and then apply to graduate programs. The graduate schools may have specific Masters tracks, but not really any general Masters program. Instead, you spend the first year or two taking graduate classes until you either satisfy the assorted requirements or pass an oral qualification (They still have those, right?), after which you focus entirely on research, publications, and your dissertation defense. If you leave school before defending, you get the Masters as a sort of consolation prize---most US PhDs in my experience don't have Masters.
In the US, it's typically a 4 year undergrad, 2 year masters, and a ~5-7 year PhD depending on the program itself. From what I understand, while there are some classes you take as a PhD student, a majority of your work is doing research
There is no formal difference between masters and PhD courses. I think he calls it a PhD level course because it is "research oriented" and requires work that perhaps a Masters student looking to complete credits might not be interested in.
The evaluation criteria is also fairly subjective and rough, which might not work for a larger masters level course.
Funny you mention 6502, because as wonderful as its assembly is, its not that friendly for higher level languages. I tried writing a basic C like language for it with 16 pointer support (see LDA ($ZP),Y), and came to the haunting realization that function calls require pushing the de-reference of zero page pointers to the stack which change their de-referencing address space from 0-255 (eg. main) to 256-512 for subsequent stack frames.
I then realized that at a speed of 1MHz, that the pointer de-referencing, the stack frame magic, and endless copies to just parse basic expressions just made 6502 a pointless target for a C like language.
Question for students: how do you find synchronous courses with some discussions during class, vs asynchronous courses where discussions are limited to an online written forum or office hours?
Asynchronous discussions are durable as content (in the internet). They are also very useful for working students. But they need focus to retain them in memory.
I'm interested by the topic but I'm a junior backend/data eng, do you think it could be helpful for my profile to dig into the concepts of compilers other than for the culture itself?
There are various weird and wonderful niche languages out there, if you appreciate programming languages as an end in themselves. Two examples: Sentient [0] and Factor [1][2].
The simplest case where they prove useful is usually when your regex is longest than 20 characters. Write a parser once, and it'll be readable to people other than yourself (if it isn't your schema is bad)
Seems a pity to throw out the various advantages of the high-level solution just because your regex engine doesn't support a scalable syntax.
Advanced regex systems such as the Ragel parser-generator support a composition-friendly syntax, [0] but the average regex system doesn't, and I guess there's no quick fix for that.
A (not)compiler can be made to look for issues in your codebase that already exists. It would need all the tools of a compiler, just that the target is the same as the source language.
It looks like its more about generating IR. The professor wrote his own LLVM like tool called Brill, and use of the tool seems to be the focus of the course.
I’ve worked on multiple compilers (optimizations expert) at MSFT on VS and CUDA and gave developed a DSL and worked on Database Compilers. I can’t hire compiler people with right skills. We’re building an IDE and those parsers are written differently, and we use Scala packrat parser combinators. These courses teach very little judgement or industry relevant stuff. When do you use packrat parser vs LALR vs LL? Good luck hiring an engineer with advanced compiler courses knowing any of this. I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
I don't teach compiler courses, but I'm an academic, and I do keep in touch with industry people to find out what's important for my data analysis course. A couple of big problems with your proposal:
1. Time variation in "what industry wants". This year one thing's hot, the next year it's something else. A bare minimum requirement for any course is that a student taking it in their second year should learn something that's still relevant three years later when they're in the first year of their job.
2. Cross sectional variation. There's no such thing as "what industry wants". Every place wants something different, with only a small subset common to 80% of employers.
My two cents - what’s useful for employers is giving students the confidence to jump into the code and figure it out. By this measure, what’s relevant for a compilers course isn’t the material itself. It’s that it’s structured to force the students to figure certain things for themselves.
There are college hires who need to be spoon fed, and college hires who just need mentorship and direction. Group two are invaluable.
7 replies →
ibains made a specific point about language processing in IDEs being different to that of (traditional) compilers. Presumably there exists a state-of-the-art there just as there's a state-of-the-art for conventional compilers, but academia doesn't give it as much attention.
21 replies →
There is no such variance and new fashions in compilers every year. There are hardware compilers and tools including language servers primarily. Move to cloud requires source to source compilers for data snd transforms.
1 reply →
Hiring for exact skills is what you do when you need the skills ASAP and the actual supply of engineers is there. The time you waste looking for someone with the exact knowledge you need could have been spent getting a good compiler engineer and just teaching them what you know. And magically that good engineer will get better, it's pretty amazing stuff.
I've was told multiple times, University/School teaches you how to learn. You get your actual knowledge and skills from the job. What I learned in 3 years of university was nothing compared to what I learned in 6 months on the job.
At first, it sounds like "win/win" if universities trained their students to use the same tools companies want. Companies would save money for training, and graduates would be ready for their jobs.
But in longer term, it is actually "win/lose", because if three years later the fashionable tools change, the companies that optimize for saving money on training would simply fire their existing employees and hire new graduates.
For the students, it is better to be the kind of person who understands the deep principles and can learn new tools as needed.
And the companies have a choice to either offer job trainings, or offer higher salaries to people currently working in other companies who already have the needed skills. (Or whine about market shortage and import H1B servants.)
I will second this. I would much rather hire an engineer who has a related background and wants to learn over an engineer with the exact skillset but does not want to learn. It may take a few months, but the former engineer will pick up what you want them to know (if you mentor them correctly), and past that inversion point, they are immensely more valuable.
You are thinking about it the wrong way. In this niche, you will almost never find people with the exact skills you are looking for. Better to find people with tangentially related skills and experience and let them grow into the role. Academia is never going to be comprehensive enough to prepare students for every niche that has some demand. Rather, they should just focus on teaching a core that will help them learn what they need later on demand.
> When do you use packrat parser vs LALR vs LL?
If you need ok performance and good error recovery (especially if you are doing stuff in an IDE!), none of the above: recursive descent is not that hard to implement, and is incredibly flexible.
packrat parser combinators are recursive descent with infinite look ahead. That’s the problem, engineers are under educated
2 replies →
Do you think it is the responsibility of academia to teach "industry relevant stuff"? I agree that courses generally don't teach judgement well, but I do think that it is the workplace/industry's responsbility to train their engineers and not expect fresh grads to be fully equipped to work on something that specific.
> Do you think it is the responsibility of academia to teach "industry relevant stuff"?
I don't think that was what ibains was going for, though i don't fault you for seeing this in that comment. (Especially that i don't think this course suffers from that problem and generally i think things are rapidly improving on this front.)
That problem to me and i think to him too is that quite a lot of things that such courses tend to teach as well thought out, well working solutions and approaches just don't work very well and frequently i find comments on what's a 'good taste' solution and what isn't that are completely my understanding of the problem space which is why.
E.g. in parsing (as that's the topic he mentioned): First, lots of people just spend way too much time on it. And they focus on parts that are of zero use to beginners (like explaining all several grammar families) and then use obtuse parser generators that save no work and sometimes use them in bizarre way (like picking a lalr parser generator then hand editing the output to support a not-quite-lalr language). Meanwhile a recursive descent parser is easy to write, fast and gives pretty good error messages with _very_ little work. You do need to know enough about grammars to be able to write one down and know if it describes an ambiguous language etc so this should be taught, but you don't need to understand the language zoo well.
17 replies →
> Do you think it is the responsibility of academia to teach "industry relevant stuff"?
In college I learned a lot of math and how to do things like calculate stress and bending. I learned nothing about material finishes, material selection, fastener selection, tolerances, drafting, etc. But the latter was easy to pick up on the job, while learning mathematical analysis on the job just doesn't happen.
1 reply →
I definitely think there ought to be courses on such things available. There should be an educational route that is between pure academia which focusses on esoterica that is only relevant to research and vocational courses that don't include much innthe way of theory at all. The idea that industry ought to pay for it presumably comes from a background where universities are private an expensive. But I'd like to live in a world where these kind of courses were government subsidised and available for free or cheap.
2 replies →
> Do you think it is the responsibility of academia to teach "industry relevant stuff"?
I mean, there are colleges that cater specifically to industry. Digipen is a good example of this: they have very close relationships with game studios and shoehorn their curriculum into the requirements the state imposes on colleges.
These sorts of colleges are not particularly popular, however, so I think most prospective students understand the value of a more broad and timeless approach to software education.
I understand the problem.
I teach compilers at a big university. And I would love to hire graduates with compiler skills for my startup, but find it difficult.
There are several structural problems that conspire to keep students from acquiring knowledge in low-level programming domain like compilation: a course needs to fit with the rest of the curriculum, and with student interest. I cannot get students interested in lexing and parsing, because they hear all day how exciting deep learning and big data are. Lexing and parsing are deemed solved in the 1960s, a sentiment I disagree with. In addition, classical theoretical computer science (e.g. automata and languages) is rarely taught with enough depths in many universities, so students lack the technical background. You can cover this in a compilers course (I do) but it takes time, and the students see it as an ordeal. Compilers as a subject is not helped by the most widely recommended book being the Dragon Book which is dated and way too long to be useful for beginners. Compare the Dragon Book with the material available as introduction to machine learning ... Many of the existing textbooks are also not covering modern compilation themes, in particular JITs, and compilation for multi-core and GPUs. I'd say there is currently no good undergraduate textbook on compilers. I could probably, with a lot of work, cobble something reasonable together from my lecture notes, but I don't believe in books, I'd like to do a good MOOC, but building suitable software infrastructure requires more work than I am currently willing to put in.
My department tried hard to remove the compilers course, as "no longer relevant", and "too hard". I threatened to resign if it was not kept as a mandatory course. For now this worked.
I never took a compilers class during university study and it is my greatest regret. I've since been self learning and it hasn't been easy.
Thanks for advocating for the importance of compilers. This lack of knowledge has been a limiter for me many times. I'd really like to fill a skill gap and be able to write a DSL or a little language to solve a problem.
1 reply →
I never understood this. I took the compilers class at CMU, and I loved it. I went on to take a follow-up class implementing a compiler with higher-order types. Meanwhile, most of my classmates avoided compilers altogether.
I've worked in the industry since undergrad, and I've implemented several interpreters for DSLs. Sometimes I was asked to make it, but other times I had the flexibility to dream up a DSL on my own, and it ended up becoming indispensable to users. I've always loved building them, but to this day, parsing is still the most boring part to me.
I recently went through Bob Nystrom's Crafting Interpreters, and it was great fun. I implemented an interpreter in Rust. Material about compiling for GPUs would be interesting. But for me, personally, I really wish there were something updated for how IDEs use languages nowadays, in particular, walking through a good way to implement the language server protocol.
The calculator language that every compiler book starts with should be LSP-first.
A MOOC would be great, but I'd also be happy with a series of blog posts.
Does anyone know of good material bridging the gap between traditional compiler architecture and the LSP?
> Compilers as a subject is not helped by the most widely recommended book being the Dragon Book
Is it? Even I, an outsider [to the modern university courses], see that the baseline quite shifted to the TigerBook (aka Appel - Modern Compiler Implementation in [your choice]).
Is it at all possible that traditionally compilation is taught back to front? I am just a hobbyist, my CS degree did not include anything about compilation other than the theory of types of Grammar. I am the first to put my hand up and say I do not really know what I am talking about.
For me, compilers are interesting because of a. optimisations; at my level simple logic that is obviously advantageous to a CFG. b. interaction with the dependencies and operation of low level hardware; how does it REALLY work.
In the hobbyist pursuit of an understanding I have become belligerent to parsing. Parsing is interesting. But also not really. I would have benefited substantially from an abstract introduction via graphs, analysis and lowering. Both in terms of interest and fundamental understanding.
Thanks. I'm actually very interested in lexing and parsing, because that's probably 99% of the stuff a layman-programmer gets to do with compiler theory in job. I mean not everyone gets chance to write the backend part, but parsing skill is almost used universally.
2 replies →
Your HN profile is empty, so there seems to be no good way to check out your institution's program or to contact you.
No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.
As this is a PhD course, I'd expect the goal of it to be preparing students for research in the field, not writing parsers for some industrial company.
Also, there are definitely courses that teach you how to write a parser. But we usually don't call them advanced.
> No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.
Not if you're writing an IDE. Writing a batch mode parser that takes an entire correct source file and produces an AST is easy. Writing a parser that instaneously handle producing an updating AST while the user is typing in the middle of the program while still allowing semantic analysis to occur and without vomiting out a pile of extraneous errors or giving up entirely is a lot harder.
5 replies →
I saw this great interview recently with Anders Hejlsberg at MSFT on how modern compiler construction differs from what is taught in traditional University courses. Is this what you're alluding to?
After watching that interview, it's strange to read a comment like yours. While the architecture is completely different, it doesn't frankly seem like that big a leap to go from "senior engineer with X years working on thing that's tangentially related to compilers" to being able to be productive in a reasonable amount of time working with the new architecture. What am I missing?
https://youtu.be/wSdV1M7n4gQ
I will say that I asked Shriram Krishnamurti about why books on interpreters and compilers tend to avoid the question of parsing like the plague and found his response a little unsatisfactory.
All these celebrated texts like SICP, EOPL, and Lisp in Small Pieces avoid the question of parsing altogether by focusing on s-expression based languages.
His response was effectively that parsing is over-emphasized in other books, and it's truly orthogonal to other PL activities. Which I can understand, but in practice when I need a parser in my day job I usually don't have much say in the structure of the input "language" (since I'm usually parsing something coming from another source that I have little control over). And it would seem if you have an instructor in a compilers course with this point of view, what other class would give you an opportunity to learn parsing in a formal setting?
3 replies →
It's pretty hard to pick up compiler skills because very little of it is written down. It takes a lot of time (a few years) working with code bases and papers to absorb it.
3 replies →
Yes, more validation - compilers as taught in universities need improvement.
So, a startup of 10 people should hire and train people for months. Raise a $2M seed round, have 12-18 months to show ProductMarket fit and as a founder teach all engineers their jobs? How many engineers go into compilers after school? So university education is all they know. The industry is built on qualified engineers. NVIDIA ceo was ahead of curve on graphics card, google founders ahead in search, why are compilers ones behind the curve?
1 reply →
Huh, he's talking a lot about how doing everything incrementally is infeasible; isn't this how Rust's compiler/tooling infrastructure works?
Parsing is frontend of a compiler. That the compiler course focuses on compiler backend does not mean it's not relevant to the industry. I personally find aliasing, flow analysis, and and optimization very relevant to what we needed to do.
As for industry vs academia, wouldn't it be fair to say that professors should teach which ever is more advanced? I certainly don't believe that teaching Java generics, no matter how relevant that is, is relevant to a PhD program, which is supposed to push the state of the art of a subject.
> Good luck hiring an engineer with advanced compiler courses knowing any of this.
Have you tried raising the level of compensation you offer? There are engineers elsewhere doing this at other companies you could presumably get if you put your hands in your pockets?
“We can’t hire for x” almost always means “we can’t hire for x for what we’re willing to pay”
2 replies →
We pay very very well, all my peers from NVIDIA are above 700k. Why would you presume anything about salary?
6 replies →
College is for imparting foundational knowledge. It isn't vocational training. You should find capable employees and train them for your needs.
> These courses teach very little judgement or industry relevant stuff.
Hasn't that been the debate for, like, ever?
Should the universities teach you how to learn or should they teach you how to hit the ground running at your first post-graduation job?
I'd say anyone who took phd-level courses on compiler design shouldn't take too much training to become a valuable team member as they've proven they can learn what is important for the given task but maybe that's just my uninformed opinion.
I feel like "teach you to learn" vs "teach you useful skills"is a false dichotomy, and often just an excuse for poor teaching. They can and should do both. The reasom they don't is because they're so research focussed, which is fine as an option but a problem because there are no or vert few good options for high level academic courses that are more practically focussed.
> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
Courses should be used to teach general concepts while industry should be used to show those how to apply those concepts. The choice between using a packrat parser, LALR, or LL would be easier to explain to someone who has been exposed to those concepts and understands the advantages and disadvantages of each approach compared to someone who has never heard of the terms.
Best practices change with time, but knowing the base concepts and how to apply them will help people adapt. If they're only exposed to what's considered relevant at this time, then they won't have the necessary exposure to concepts that may pertain to future best practice.
> I can’t hire compiler people with right skills.
You can hire bright, motivated and mature people and teach them the skills they need. Academia doesn't exist to provide profit-maximizing, ready inputs to business. They produce people who have the skills to (hopefully) be able to grow into any position.
I have some idea what I'm talking about, see my post https://news.ycombinator.com/item?id=25210523 and I don't kniow about why using a packrat over anything else. I pull something off the shelf, antlr for my latest project, then get on with it.
The parser is the most trivial side of things, if you consider that a key skill it comes across as being an issue from the 1970s, not today. Writing a RD by hand is not a big deal anyway as someone else said.
If your into optimising compilers you're focusing on a very odd area to criticise. I'd gave though more about dataflow using lattices or whatever, graph stuff etc.
So parsing aside, what are the 'relevant' skills you feel are missing?
(and what are 'Database Compilers'? Nearest I can think of that may fit are query optimisers)
What ressources do you recommend I read, or what I can do, to learn more about applicable compiler knowledge?
Prev post by me, if it's any help https://news.ycombinator.com/item?id=25210523
Any resources you can suggest to point interested engineers in the right direction?
Unless you're looking for an external consultant for the one month project, you're doing it wrong: https://www.reddit.com/r/ProgrammerHumor/comments/4k994j/if_...
As someone who is happy when he interviews a candidate who at least knows you don't start parsing with `string.split()` I gotta say, your bar is extremely high. Surely MSFT can afford some internal training?
> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
I'm sorry if you've already answered, or if it comes up a lot, but could you toss up some buzzwords/more industry-standard things you're looking for? I was going to look into this course, and still probably will, but if you're saying it's not relevant I'd like to know what to be looking into instead?
Parser combinators allow elegant implementations of parsing but are ridiculously slow compared to some Knuth-optimized parsers.
As elegant as parser combinators are, it is not so easy to work out exactly what grammar they are implementing.
packrat parsers are fast (at expense of some memory)
I am currently writing a parser using combinator. How can I reach you to learn what is the proper method to use on each case?
perhaps you should guest lecture at a university and have them record all the lessons? your kind of experience is rare.
Those mean academics! How dare they teach anything else than what you need right now.
I second this (although I work more on translation and modeling than optimization). IME, it's much easier to find otherwise competent, savvy researchers and spin them onto compiler work than it is to take someone who's gone through an "advanced" course.
Is that such a critical hiring criteria? That seems like something I could learn in a week.
> That seems like something I could learn in a week.
So why not learn it in a week then apply for the job?
I would suggest the reason is... because you cannot learn it in a week.
3 replies →
> When do you use packrat parser vs LALR vs LL? Good luck hiring an engineer with advanced compiler courses knowing any of this
Honestly, I have no idea why you would want any of those. Can you explain in a few sentences?
I written a parser before and it was faster than everything I used that does the same thing or similar. And I haven't had any professional or teachers teach me parsing
I once wrote a json parser in .NET. It was faster than literally everything I tried but I already know it can't touch simdjson and I haven't compared it to the new dotnet parser but I imagine they do it faster. I'm actually confused how people write such shitty parsers when I barely know what I'm doing. Like are they using a link list or revisiting characters multiple times to parse? (I think two visits is acceptable. One to find the end of a number and another pass to convert the number to value although I'm sure it's not hard to do one pass)
This is such an empty criticism. Universities are not job training facilities. You want people who have the exactly right skillset? Hire smart kids and train them, you know, the way it's been done for hundreds of years.
I agree in general. I have a PhD in comp eng and know a lot of academia type subjects, but only a small portion is especially useful in a practical setting. Also a lot of academic knowledge seems needlessly obtuse. It becomes a lot clearer when looking at the practical problems that spurred the academic fields. At times I get the aha as to why this abstract concept is addressing a meaningful question, and if that's how academic education oriented itself, around these basic meaningful questions it would help people learn much more effectively.
> When do you use packrat parser vs LALR vs LL?
This is not something you can expects candidates to know.
Domain-specific compilers are a bigger field, where there's more opportunity to do something interesting
Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a domain-specific program into C
For example, NN-512 (https://NN-512.com) is a compiler that takes as input a simple text description of a convolutional neural net inference graph and produces as output a stand-alone C implementation of that graph
GNU Bison (https://www.gnu.org/software/bison/) is another example, where the output is a parser: Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser (either in C, C++, or Java) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar
Halide (https://halide-lang.org/) is a language for fast, portable computation on images and tensors, and it originally generated human-readable C++ code as output (but not anymore, now it feeds directly into LLVM)
Can anyone provide other examples? (Code generators for network packet processing?)
> Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a program into C.
Don't you still need many of the techniques taught in this class to build something like that? The material doesn't look inherently targeted at low-level languages or machine-code as a target.
One way to think of it is to split the "compiler" problem into three big pieces. 1) Front-end, 2) back-end, 3) tooling.
Front end is lexing, parsing, building the AST (in whole or just keeping pieces of it around), semantic analysis. When that is done, you can say "Yup, valid program. It is possible to generate code." So, to your question, yes. Those techniques from the course are 100% in play.
Back-end is turning the AST into something you can optimized and generate code from. (Sometimes there is a "middle-end" in there....) High-level optimizations, low level optimizations, memory allocation, register allocation.
Tooling is everything that keeps your life from being miserable as a user -- starting with debug symbols and these days people expect library and package management, etc, etc.
So if you are interested in exploring the Next Great Syntax or some semantic problems that can be re-written into C, then doing a C-front is a great way to do that. Let the C compiler handle all the really-really-really-hard back-end issues for you, and avoid all that distraction. But.... expect that getting debug symbols from your spiffy language into the ready-to-link object file is going to be, as they say, "non-trivial". So... great for experiments and a great learning exercise, but it's hard to do a production compiler that way.
1 reply →
I have just spent the last 13 months significantly reworking a compiler that turns Excel-like formulas into JavaScript. The compiler is used by an app designer, and the formula language is meant to be easy to pick up for people used to spreadsheets. A significant difference compared to Excel is that our compiler features static typing, enabling helpful error messages to be produced at build-time.
(Spreadsheets, on the other hand, tend to try very hard to make sense of types at runtime, using type coercion to turn values of the wrong type into something of the expected type they can work with. That produces surprising results and confuses users. =IF(2, 3, 4) will generally produce 3, because 2 -- which should be a logical/boolean value -- is interpreted as TRUE. We think it's a better idea to warn the user that the first parameter is wrong, and to do that as soon as possible.)
By far the largest challenge with reworking the compiler has been the type system. I have expanded our type system from the four standard spreadsheet types to encompass more than 80, as well as added support for lambdas (for use with functions like SUMIF and FILTER), type inference, parameterized types and union types. The biggest addition is probably an imperative mode, that executes formulas with side-effects in response to events being fired. This is all in service of features we have wanted to add for years, that have been held back by our (previously) simplistic type system.
I did take a compiler class at university, more than 15 years ago, but while that proved very useful when writing the formula parser (an operator-precedence parser, initially straight out of a textbook), type theory was not part of the class. I had to pick up bits and pieces by reading the Java Language Specification (JLS) and by experimenting.
The Cornell class doesn't seem to feature type theory either, which I think is unfortunate. Now that the pendulum has swung, yet again, firmly in the direction of static typing, perhaps static typing will become a bigger part of CS compiler curriculums.
Your project sounds like fun. What/for whom is the project? What level of Excel formula compatibility are you shooting for, e.g. just in the neighborhood like Google Sheets, reasonably close as in Libre Office, or full compatibility with the latest version? If the latter you must have one gnarly test suite. Would love to hear more about it.
4 replies →
MATLAB Coder - Generate C and C++ code from MATLAB code https://www.mathworks.com/products/matlab-coder.html
Actually the whole family of embedded code generation products from MathWorks https://www.mathworks.com/solutions/embedded-code-generation...
AnyDSL (https://anydsl.github.io/) might be worth mentioning, it is a framework for creating domain specific languages using partial evaluation to give powerful abstractions to the user while still producing high-performance code for different target platforms. On the website, you can find that they already have DSLs for ray tracing and image stencil codes as well as works for bioinformatics.
Not open source but I'm working on a system that turns a DSL (embedded into Jupyter notebooks) into either c or c with opencl for a niche finance application. Not a huge project (<10k lines of code). Nature of the problem means that DSL can be very simple (it's aimed at non-programmers). This approach also means that could easily add other targets - e.g. cuda or wasm.
Awesome. Link?
1 reply →
Do you have a structured course for Domain-specific compilers available online?
A fantastic submission, very much appreciated.
I find the presentation style of subtitles with surrounding context extremely helpful. I often lose the ability to listen when concentrating; the subtitle context provides a superior recovery mechanism to the usual, try to work out what was missed. I look forward to purchasing augmented reality glasses at some point in the future which add this advantage to in-person interactions.
Yes!
It’s the complete opposite of, and significantly more user friendly than, how zoom recordings synchronize meeting subtitles with audio/video playback.
In their case they only show the current line in a text message-like sidebar. As the content plays, it snaps the scroll position of the messages to current, hiding the rest. This breaks scrolling ahead or back while listening and you lose any context.
Hopefully I’m just missing a setting - I try to minimize the time I spend in there. Is there a way I can take their damn horse blinders off?
I too appreciate the subtitles and I tend to have them on when I watch+listen to a DVD. You can even click on the sentence and the video jumps to that point. However ... a little proof-reading would help here. To get things started gets transcribed into Ticketing, sir.
I love this line...
“Real 6120 also has an end-of-semester course project—in the self-guided version, your end-of-semester assignment is to change the world through the magic of compilers”
I wish they’d be more courses on compilers that also serve as language servers. The current world is moving towards a phase where a language needs IDE tooling such as compilers, linters, formatters, REPL, language servers, debuggers and profilers.
Would love to see a high level course that covers how to build the tooling for a great developer experience.
As i mentioned in another thread, a uni course that has students work on an existing codebase instead of writing something from scratch might be even better - though harder to grade.
Imagine if one could do a major feature contribution to clangd/pyls/rust-analyzer as part of their advanced compilers course. The student gets a better understanding of (open-source) software development practices, gains a deeper proficiency in the language (not just the APIs), the project brings new contributors and users (including the student herself) benefit from the new feature.
Double bubble!
My uni, NC State, has a course (CSC 326) called Software Engineering that has students contribute to an existing project created by the staff in teams, with the best contributions being merged into master for the next semester students to work on.
3 replies →
Anyone interested in following the course together during the holidays? To keep each other motivated etc.
If you have to stop "early" please get through the simple DCE and LVN passes.
A solid basic-block level DCE & LVN optimization along with a judicious "compile time evaluator" (simple constant folding) will let you produce code at around 80% of the level of LLVM.
Obviously, you're not going to get the magic that comes of LLVM's scalar analysis pass, but you're not going to be embarrassed with your performance, either.
Yes! How should we organize this? Maybe start with an IRC channel?
Yes! an IRC channel would be great! Do you want to start it and post the info here? Or else I can start
6 replies →
> Anyone interested in following the course together during the holidays? To keep each other motivated etc.
Hey, don't forget my little "advent of code" too:
https://github.com/pfalcon/python-ast-hacking-challenges/iss...
"Convert Python source code to SSA form"
I'm interested. I'm writing a compiler-interpreter now but my broad compiler-construction chops are spotty and going through this systematically might be useful.
I would love to do so! I'm a masters student who has been wishing there was an advanced compilers course in my program, there's only the undergrad level one, so this is perfect! Let me know how I can get in contact.
I’m interested. I may not have the most relevant background, though — I’m a data scientist with an econometrics education. The topic really does interest me, though.
I'm a devops engineer. So not very related either!
I'm game. My background is in math and not CS, so I can't say I'm certain I have the prerequisite knowledge.
I would love to join as well. Loop me in too, if there is IRC channel
Created a discord channel https://discord.gg/RBPmmdWg
Do you have a group for us to join?
Okay, I just created this on discord https://discord.gg/RBPmmdWg. Considering it might be more accessible for people. Wondering if I should post this out of the thread.
On the related note, what would you guys recommend as an introduction to compilers, say for beginners?
Crafting Interpreters has been great. It starts with an AST-walking interpreter (in Java), so you get to focus on parsing and the simplest form of interpretation, and then it moves to a bytecode interpreter (in C) so you get to learn about compiling to a more primitive target via a lower-level language (much of which should translate to a "real" compiler). Very approachable, has lots of side-context and an entertaining style. The language you interpret was custom-designed for the course, which helps a lot, and resembles a simplified JavaScript without all the messy parts.
Author here. Also you can read the whole book online for free: http://craftinginterpreters.com/
Engineering a compiler is by far the best introductory book that is also modern.
Advanced compiling for high performance is one of not many good accounts of scheduling for non-toy architectures.
Muchnick's book is a good encyclopedia of optimisations although the code is unreadable and buggy.
cs4410[1] is really really good, and cse131[2] is basically the same thing but with videos. I highly recommend them both.
[1]: https://course.ccs.neu.edu/cs4410/
[2]: https://ucsd-cse131-f19.github.io/
Compilers books: I'd start with "Engineering a Compiler" by Keith Cooper and Linda Torczon. http://craftinginterpreters.com/ is also a pretty great, programming-oriented intro, which may be good to work through alongside. For more on the analysis & compiler optimization side, "SSA-based Compiler Design" (http://ssabook.gforge.inria.fr/latest/; GitHub Mirror: https://github.com/pfalcon/ssabook) is a good follow-up.
Further readings: Book recommendations in https://github.com/MattPD/cpplinks/blob/master/compilers.md#... as well as program analysis resources (in particular lattice theory, type systems and programming languages theory, related notation): https://gist.github.com/MattPD/00573ee14bf85ccac6bed3c0678dd...
Courses: I can recommend the following: https://github.com/MattPD/cpplinks/blob/master/compilers.md#...
Particularly (in alphabetical order--I think these are all great, so including highlights of what I've liked about them):
- IU P423/P523: Compilers (Programming Language Implementation) - Jeremy Siek, with the course book "Essentials of Compilation: An Incremental Approach" (pretty interesting approach, with programming language features developed incrementally having a fully working compiler at each step, cf. http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf; implementation language Racket),
- KAIST CS420: Compiler Design - Jeehoon Kang (good modern treatment of SSA representation itself, including the use of block arguments, https://mlir.llvm.org/docs/Rationale/Rationale/#block-argume..., as well as SSA-based analysis and optimization; Rust as an implementation language),
- UCSD CSE 131: Compiler Construction - Joseph Gibbs Politz, Ranjit Jhala (great lecturers, both Haskell and OCaml edition were interesting; fun extra: one of the Fall 2019 lectures (11/26) has an interesting discussion of the trade-offs between traditional OOP and FP compiler implementation),
- UCSD CSE 231: Advanced Compiler Design - Sorin Lerner (after UCSD CSE 131: for more on analysis & optimization--data flow analysis, lattice theory, SSA, optimization; fun extra: the final Winter 2018 lecture highlighted one of my favorite papers, https://pldi15.sigplan.org/details/pldi2015-papers/31/Provab...),
- UW CSE CSEP 501: Compilers - Hal Perkins (nice balanced introduction, including x86-64 assembly code generation, with the aforementioned "Engineering a Compiler" used as the course textbook).
Thank you so much for this comment. It's incredibly difficult to find modern compiler learning resources – when I ask, I often get disparaging comments about the dragon book and not much more.
Please, consider putting this list on your blog, or somewhere more visible than a HN comment.
1 reply →
I've been reading "Modern Compiler Design" currently, not too sure how beginner friendly it is though.
Also, haven't made it too far into the book so can't really give a recommendation either way.
Stanford Compilers Course
Even if you don't do the course, the first paper they list is a fascinating must-read:
Producing Wrong Data Without Doing Anything Obviously Wrong! Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. ASPLOS 2009.
https://dl.acm.org/doi/10.1145/2528521.1508275
I happened to be at that ASPLOS and I thought it was fascinating. Good to see it recognized as one of the most important papers in compilers.
Has anyone taken Compilers course from David Beazley: https://www.dabeaz.com/compiler.html. I'd love to hear your feedback, please
I took the undergraduate version of this course while at Cornell. It was a wonderful course. I taught me a lot how computers work, and how to organize a large (for me at the time) software project.
One of the most painful parts of the the project was using Scala which had ridiculously long compile times which made iterating quite a pain.
My favorite memory was that for one project we had to submit a program written meant to stress test everyone's parser.
The course materials look good.
But what is a "PhD level course"? All I've seen is undergrad and graduate, and the difference between masters and PhD has nothing to do with classes.
Cornell student here that's taken a bunch of classes at all of the levels - in my experience, undergraduate classes/master's classes are kind of taught with the expectation that you're taking the course as a requirement and are much more evaluation heavy, and that you're likely taking a full course load of other courses as well. PhD level courses are generally assumed to be the only course that you're taking that semester and that you're taking it more because the material will be useful for your research rather than for the credential, so there's much less emphasis on evaluation; i.e. a smaller quantity homeworks that each much harder than an undergraduate version of the same class, and final evaluations are much closer to reading recent research papers and understanding them rather than a timed exam.
> the difference between masters and PhD has nothing to do with classes.
huh ? what's your education system ? in europe it's generally 3 year bachelor, then 2 year masters, then 3 years phd so when you have classes during your phd hopefully they are different that the one you get during your masters
Depends on the field. I studied philosophy (Ph.D. program, but finished with an M.A.), and the M.A. is often just a degree given after the first few years of the Ph.D. There are a handful of required first year courses (logic, basic metaphysics, philosophy of science and ethics), but otherwise students take whatever courses are relevant to their interests. It helps that philosophy typically has fewer classes with explicit prerequisites. If you take an advanced course, you might be lost, but it doesn't feel the same as not being knowing a formula or how to derive something.
Some courses are more basic, and some are more like collaborative research seminars. Advanced students gravitate towards the latter (and typically take fewer courses and/or audit more of them), but a first year with the right background can still take them. Conversely, an advanced student might take an introductory course in an area they lack background in to broaden their knowledge.
There also are terminal M.A. programs, but they're less common, and the majority of students who do Ph.Ds at prestigious institutions don't do one.
I am in the US, and at my school masters and PhD students all take the same classes.
It is not ideal because many of the masters students are coming from a non-CS background. Some of the graduate level CS courses I have taken were easier than 4th year level undergrad classes.
3 replies →
I honestly have never figured out European education systems. Sorry!
But this is Cornell, which I assume is like other research-oriented US universities. You have a 4 (or maybe 5) year undergraduate, leading to a Bachelors and then apply to graduate programs. The graduate schools may have specific Masters tracks, but not really any general Masters program. Instead, you spend the first year or two taking graduate classes until you either satisfy the assorted requirements or pass an oral qualification (They still have those, right?), after which you focus entirely on research, publications, and your dissertation defense. If you leave school before defending, you get the Masters as a sort of consolation prize---most US PhDs in my experience don't have Masters.
In the US, it's typically a 4 year undergrad, 2 year masters, and a ~5-7 year PhD depending on the program itself. From what I understand, while there are some classes you take as a PhD student, a majority of your work is doing research
3 replies →
There is no formal difference between masters and PhD courses. I think he calls it a PhD level course because it is "research oriented" and requires work that perhaps a Masters student looking to complete credits might not be interested in.
The evaluation criteria is also fairly subjective and rough, which might not work for a larger masters level course.
Well, I don't feel like cleaning the basement or finishing my yard work before the snow, so I might just do this course over the holiday instead.
Any idea if I can jump right into this after my only experience being Crafting Interpreters?
Of course. Why not?
They should renumber it CS 6502.
Funny you mention 6502, because as wonderful as its assembly is, its not that friendly for higher level languages. I tried writing a basic C like language for it with 16 pointer support (see LDA ($ZP),Y), and came to the haunting realization that function calls require pushing the de-reference of zero page pointers to the stack which change their de-referencing address space from 0-255 (eg. main) to 256-512 for subsequent stack frames.
I then realized that at a speed of 1MHz, that the pointer de-referencing, the stack frame magic, and endless copies to just parse basic expressions just made 6502 a pointless target for a C like language.
It was fun though.
This course is asynchronously delivered.
Question for students: how do you find synchronous courses with some discussions during class, vs asynchronous courses where discussions are limited to an online written forum or office hours?
Asynchronous discussions are durable as content (in the internet). They are also very useful for working students. But they need focus to retain them in memory.
Not a problem at all. For non-introductory courses, this works well.
But needs to have at least some sort of synchronous at some point. In my one class, we use Notion for discussing so it has a live sense of discussion.
Listening to a Russian theoretical physicist mumble about how trivial various results are over a zoom call really makes my day, put it that way.i
I'm interested by the topic but I'm a junior backend/data eng, do you think it could be helpful for my profile to dig into the concepts of compilers other than for the culture itself?
What would be some cool use cases to design your own compiler?
There are various weird and wonderful niche languages out there, if you appreciate programming languages as an end in themselves. Two examples: Sentient [0] and Factor [1][2].
[0] https://sentient-lang.org/
[1] https://factorcode.org/
[2] Factor's home-page does a poor job of presenting examples to give a flavour for the language, so here's one on GitHub: https://github.com/factor/factor/blob/master/basis/roman/rom...
Sentient looks neat! Does it use the same strategies as Prolog under the hood?
2 replies →
The simplest case where they prove useful is usually when your regex is longest than 20 characters. Write a parser once, and it'll be readable to people other than yourself (if it isn't your schema is bad)
Seems a pity to throw out the various advantages of the high-level solution just because your regex engine doesn't support a scalable syntax.
Advanced regex systems such as the Ragel parser-generator support a composition-friendly syntax, [0] but the average regex system doesn't, and I guess there's no quick fix for that.
[0] http://thingsaaronmade.com/blog/a-simple-intro-to-writing-a-...
A (not)compiler can be made to look for issues in your codebase that already exists. It would need all the tools of a compiler, just that the target is the same as the source language.
Nice, although I'd have wanted a segment on compiler testing.
Indeed; I previously had these papers on the list but had to take them out for time this semester:
- Finding and understanding bugs in C compilers. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. PLDI 2011. https://dl.acm.org/citation.cfm?id=1993532 - Compiler validation via equivalence modulo inputs. Vu Le, Mehrdad Afshari, and Zhendong Su. PLDI 2014. https://dl.acm.org/citation.cfm?id=2594334
There are just too many fun things to teach.
More recent paper: yarpgen
https://github.com/intel/yarpgen/blob/main/papers/yarpgen-oo...
Oh, compiler construction :) I really enjoyed that lecture during my M.Sc.
"parallelization, just-in-time compilation, and garbage collection" sound fun, too. Especially JIT.
It's great that there's a lecture on dynamic compilation - so often overlooked.
This is amazing. Thank you for sharing.
No code generation?
It looks like its more about generating IR. The professor wrote his own LLVM like tool called Brill, and use of the tool seems to be the focus of the course.
That doesn't sound difficult? if (x86) write byte X, Y, Z else if arm write byte J K L
What a treat. Thanks for this submission.