Comment by frou_dh
1 day ago
It's so nice when an editor can do completely accurate syntax-highlighting for a language. I think there is a subconscious disturbing effect when being presented with false-positive and false-negative colouring here and there, as traditional "good-enough" hacky syntax highlighting tends to result in.
Yeah after writing this highlighter, I started noticing what I consider bugs in other highlighters
e.g. although Vim's syntax highlighting helped me learn shell, it highlights numbers like 'echo 42' in a special way, which is misleading, because shell doesn't have numbers. (On the other hand, YSH does, but not in 'echo 42' either!)
On the other hand, there are also language design issues. Shell also allows MULTIPLE here docs, and I claim that ZERO syntax highlighters handle it correctly - https://github.com/oils-for-unix/oils.vim/blob/main/demo/bad...
(YSH removes here docs in favor of Python-like multi-line strings)
---
But the "surprise" in this article is that Vim is powerful, and you can write a good syntax highlighter or a bad one. There are many possible "programs" to write in this paradigm
I'd also say "completely accurate" highlighting doesn't really exist in practice, and is even problematic in theory.
Tree-sitter grammars are not completely faithful to the original language, because the metalanguage is limited. And highlighters have to deal with incomplete code, so it's not clear what "two parsers being the same" means.
Vim has the best syntax highlighting engine out there.
The approach of regions and match items which can contain each other in a hierarchy can handle anything.
By the way, I use Vim for web requests to highlight code served by CGIT.
Its a big shame, as my preference for how to see code would 100% fall in a "literate style" if I could get it. I'd love an even more dynamic view than that style, if I could. But, I'm fairly sure that 100% correct syntax highlighting would not be possible in that world? Especially in some of the more complicated syntax options out there.
I'm also curious on how many times you have used something that didn't get syntax highlighting correct? Even using some of the more advanced cweb features of org-mode, it typically gets things more correct than not. And I don't think it is using anything more than regexps? (I have not checked to see how the tree sitter stuff interacts with cweb in many blocks. Will try and look into that.)
Something that seems quite common on the false-negative side is type names not being highlighted at all when they are the names of user-defined types, even though they're being used in type positions in the code. Dumb highlighting will just have a fixed list of type names it knows about, because it is not as aware of the positional aspect of usages.
And this is an example where I feel this is strikingly like proper name usage in language. Everyone has a different set of proper names that they have ingrained in their mind for so long that, hearing them, they will jump out differently than other proper names. We literally ingrain a fixed list of names in our brains starting at a very young age.