← Back to context

Comment by geofft

6 years ago

One of the nice things about code being text is that you can copy and paste unparseable subsets of code without anything getting in your way. For instance, if you need to move an if/else out of a function, you can move the if statement, reindent the body, and then move the else block. If you have syntax highlighting on, it might briefly mis-highlight (e.g., it may not know what to do with the else block when it's briefly unpaired) but it will let you do it, and it will fix itself once the program is properly parseable again.

Compare with, say, a WYSYWIG rich text / HTML editor. If you want to reorganize a bulleted list, it's many more steps, because the tool isn't really set up to accommodate a momentarily-invalid state. I think that's the big difference between syntax-highlighting something that fundamentally remains text and switching the representation to not-text.

This also lets you do various sorts of text manipulation like conflicted merges. There are lots of arguments for an AST-aware merge, but in case of a conflict, a text-based merge system that inserts standard conflict markers will still usually leave you with a file that parses well enough to be syntax-highlighted, even if it won't compile with the conflict markers still in place. Or even imagine converting a program from one language to another (e.g., a shell script that outgrew shell). You can stick the invalid code in your text editor, ignore the highlighting, and turn it line-by-line into the new language.

I think all the highlighting examples in this article work as overlays, just like syntax highlighting, so they'd work acceptably well in these briefly-invalid states. In some cases it'll fail to highlight or it will need aggressive error-recovery that would be dubious in an actual compiler or interpreter (imagine, say, "if you see a conflict market, skip the <<< portion and parse the >>> portion to find what locals exist and what their types are, then go back and try to highlight the <<< portion"), but since the highlighter isn't the real interpreter, that's fine.

> the tool isn't really set up to accommodate a momentarily-invalid state.

I've struggled to explain to students, teachers and others my frustrations with anything that isn't plaintext code. This is it - thank you!

I think it's also a neat learning concept about why it's important one is _able_ to make mistakes when writing code. So many are overwhelmed by the flexibility or fragility of syntaxes but there's actually a lot of power in that.

Years back I had a mentor of sorts very strongly convince me that "degenerate" cases where code doesn't create a valid AST from the perspective of code editing and source control standpoint should be considered something of the "default case". We spend a lot more time on work-in-progress code than we ever do finished compiling code. Invalid states aren't often as "brief" as we think they are, and there are far too many reasons why you want to be able to save and even source control work-in-progress code (including things like "it's the end of the day and I want to make sure I have this backed up" and "maybe my coworker can spot why this isn't parsing because my tired eyes are not seeing it").

> If you have syntax highlighting on, it might briefly mis-highlight (e.g., it may not know what to do with the else block when it's briefly unpaired) but it will let you do it, and it will fix itself once the program is properly parseable again.

This intuition that syntax highlighting token streams already are the most generic "semantic" tool we have readily available, are very resilient to work-in-progress states, and are very fast (because we use them in real time in editors), led me to experimenting with a token-stream based diff tool. [1]

I got some really good results in my experiments with it. It gives you character-based diffs (as opposed to line-based diffs) better (more semantically meaningful) and faster than the other character-based diff tools I compared it to. You could probably use it as diff tool with git projects today if you wanted, but it would mostly just be a UI toy as git is snapshot-based rather than a patch-based source control system. (I explored the idea curious if might be useful to patch-based darcs. Darcs kept exploring the idea of trying to implement a character-based patch format in addition to or in replacement of its line-based patch format, but so far as I saw never did, but if it did, this tool would potentially be quite powerful there.) It's a neat toy/experiment though.

[1] https://github.com/WorldMaker/tokdiff