Comment by onion2k

4 years ago

A few hours later another programmer came up with the prototype of a much faster terminal renderer, proving that for an experienced programmer a terminal renderer is a fun weekend project and far away from being a multiyear long research undertaking.

I have no idea if this is the case here, and I suspect it might not be, but pretty much every time I've seen a developer complain that something is slow and then 'prove' that it can be faster by making a proof-of-concept the only reason theirs is faster is because it doesn't implement the important-but-slow bits and it ignores most of the edge cases. You shouldn't automatically assume something is actually bad just because someone shows a better proof-of-concept 'alternative'. They may have just ignored half the stuff it needs to do.

This particular case was discussed at length on Reddit and on YC News. The general consensus was that the Microsoft developers simply didn't have performance in the vocabulary, and couldn't fathom it being a solvable problem despite having a trivial scenario on their hands with no complexity to it at all.

The "complaining developer" produced a proof of concept in just two weekends that notably had more features[1] and was more correct than the Windows Terminal!

RefTerm 2 vs Windows Terminal in action: https://news.ycombinator.com/item?id=27775268

[1] Features relevant to the debate at any rate, which was that it is possible to write a high-performance terminal renderer that also correctly renders Unicode. He didn't implement a lot of non-rendering features, but those are beside the point.

  • And the experienced developer is Casey Muratori who is somewhat well known for being a very experienced developer. That makes it less likely that he doesn't know what he's talking about and is skipping over hard/slow features.

    • And he had a condescending tone from the beginning (as he always does). Maybe if he was more respectful / likable, the developers would have responded better.

      27 replies →

    • However, his experience, in games and game development tools AFAIK, might not be fully applicable to the development of mainstream commercial software that has to try to be all things to all people, including considerations like internationalization, accessibility, and backward compatibility. The performance difference that he demonstrated between Windows Terminal and refterm is certainly dramatic, but I wouldn't be surprised if there's something he's overlooking.

      51 replies →

  • I found it very funny that the Hindi sample text on display, in the YouTube refterm demo, means “You can wake up someone who is sleeping, but how do wake up someone who is hell bent on pretending to sleep?”.

  • A bit out of topic but has anybody followed performance issues of Microsoft Flight Simulator 2020? For more than half a year it was struggling with performance because it was CPU heavy, only loading one core and etc. Barely ran on my i5 6500. Fast forward half a year, they want to release it on XBox. MS/Asobo moves a lot of computation on GPU, game starts running smoothly on the very same i5 with maximized quality settings.

    You just begin to wonder how these things happen. You would think top programmers work at these companies. Why would they not start with the good concept, loading GPU first etc. Why did it take them so much time to finally do it correctly. Why waste time not doing it at the beginning.

    • It's pretty straightforward case of prioritization. There are always more things to do on a game project than you have people and time to do.

      The game runs well enough so the people who could optimize things by rewriting them from CPU to GPU are doing other things instead. Later performance is a noticeable problem to the dev team, from customer feedback and need to ship in more resource constrained environments (VR and XBox) and that person then can do work to improve performance.

      It's also handy to have a reference CPU implementation both to get your head around a problem and because debugging on the GPU is extremely painful.

      To go further down the rabbit hole it could be that they were resource constrained on the GPU and couldn't shift work there until other optimizations had been made. And so on with dependencies to getting a piece of work done on a complex project.

      4 replies →

    • It may sound oversimplified, but IME PC games are only optimized to the point where it runs well on the development teams beefy PCs (or in the best case some artificial 'minimal requirement PC with graphics details turned to low', but this minimal requirement setup usually isn't taken very seriously by the development team).

      When porting to a game console you can't simply define some random 'minimal requirement' hardware, because the game console is that hardware. So you start looking for more optimization opportunities to make the game run smoothly on the new target hardware, and some of those optimizations may also make the PC version better.

      1 reply →

    • Because a rule of thumb is to not focus too much on performance in the beginning of a project. Better a completed project with some performance issues, than a half product with hyper speed. The key thing with development is to find somekind of balance within all these attributes (stability, performance, loo, reusability etc) In case of FS simulator. Not sure what the motives were. Sure that they had some serious time constraints. I think they did an acceptable job there.

      4 replies →

    • I'll reiterate a rant about Flight Simulator 2020 here because it's on-topic.

      It was called "download simulator" by some, because even the initial installation phase was poorly optimised.

      But merely calling it "poorly optimised" isn't really sufficient to get the point across.

      It wasn't "poor", or "suboptimal". It was literally as bad as possible without deliberately trying to add useless slowdown code.

      The best equivalent I can use is Trivial FTP (TFPT). It's used only in the most resource-constrained environments where even buffering a few kilobytes is out of the question. Embedded microcontrollers in NIC boot ROMs, that kind of thing. It's literally maximally slow. It ping-pongs for every block, and it uses small blocks by default. If you do anything to a network protocol at all, it's a step up from TFTP. (Just adding a few packets worth of buffering and windowing dramatically speeds it up, and this enhancement thankfully did make it into a recent update of the standard.)

      People get bogged down in these discussions around which alternate solution is more optimal. They're typically arguing over which part of a Pareto frontier they think is most applicable. But TFTP and Microsoft FS2020 aren't on the Pareto frontier. They're in the exact corner of the diagram, where there is no curve. They're at a singularity: the maximally suboptimal point (0,0).

      This line of thinking is similar to the "Toward a Science of Morality" by the famous atheist Sam Harris. He starts with a definition of "maximum badness", and defines "good" as the direction away from it in the solution space. Theists and atheists don't necessarily have to agree with the specific high-dimensional solution vector, but they have to agree that that there is an origin, otherwise there's no meaningful discussion possible.

      Microsoft Terminal wasn't at (0,0) but it was close. Doing hilariously trivial "optimisations" would allow you to move very much further in the solution space towards the frontier.

      The Microsoft Terminal developers (mistakenly) assumed that they were already at the Pareto frontier, and that the people that opened the Github Issue were asking them to move the frontier. That does usually require research!

      2 replies →

    • The damn downloader in MSFS is the most infuriating thing. In Canada on either of the main ISPs I too out at 40ish Mbps whereas Steam and anything else really does close to the full 500Mbps. It also only downloads sequentially, pausing to decrypt each tiny file. And the updates are huge so it takes a good long while to download 2+GB.

    • I followed MSFS performance pretty closely since before FS5.

      What's happening is this:

      - the FS developers are using the fastest possible machines, beyond what non-overclockers can get

      - the developers refused to prioritize the tile stuttering problem evident in all versions. Ignoring that for over 30 years is just plain wilful unless ...

      - I'm pretty sure the DoD funded FS (hence the non-consumer license for Pr3pared), and I think there was a requirement for performance as a "consumer game" to not be too perfect, hence the tile stuttering.

      This is even more likely since Jane's had full-page ads for military sims using the FS engine back in the day. IOW, FS was too killer for its own good.

      (I actually did my Instrument rating practical preparation on FS5, and passed in the first attempt at a Class B airport. I don't think DoD liked where the public FS game was headed, especially FS 2000.)

  • Sometimes you don't know you have a performance problem until you have something to compare it to.

    Microsoft's greatest technical own goal of the 2010s was WSL 2.

    The original WSL was great in most respects (authentic to how Windows works; just as Windows NT has a "Windows 95" personality, Windows NT can have a "Linux" personality) but had the problem that filesystem access went through the Windows filesystem interface.

    The Windows filesystem interface is a lot slower for metadata operations (e.g. small files) than the Linux filesystem interface and is unreformable because the problem is the design of the internal API and the model for security checking.

    Nobody really complained that metadata operations in Windows was slow, they just worked around it. Some people though were doing complex build procedures inside WSL (build a Linux Kernel) and it was clear then there was a performance problem relative to Linux.

    For whatever reason, Microsoft decided this was unacceptable, so they came out with WSL 2 which got them solidly into Kryptonite territory. They took something which third party vendors could do perfectly well (install Ubuntu in a VM) and screwed it up like only Microsoft can (attempt to install it through the Windows Store, closely couple it to Windows so it almost works, depend on legitimacy based on "it's from Microsoft" as opposed to "it works", ...)

    Had Microsoft just accepted that metadata operations were a little bit slow, most WSL users would have accepted it, the ones who couldn't would run Ubuntu in a VM.

    • WSL2 worked for me in a way that WSL1 did not and it had to do with build times while doing tutorial projects. I am not an expert, but my own experience was that it was a massive improvement.

      2 replies →

  • A pretty one sided view. I use Windows terminal because it supports multiple tabs - multiple cmds and some WSL bashes.

    I don't care at all if this or that terminal uses a bit more RAM or is a few milliseconds faster.

    • Then they should have said so on the issue "We don't value performance and wont spend any resources to fix this", rather than do the dance of giving bullshit reasons for not doing it.

      Anyway, resource usage matters quite a lot, if your terminal uses CPU like an AAA game running in the background you will notice fan noises, degraded performance and potentially crashes due to overheating everywhere else in the computer.

    • > I don't care at all if this or that terminal uses a bit more RAM or is a few milliseconds faster

      Did you watch the video? The performance difference is huge! 0.7 seconds vs 3.5 minutes.

  • > The "complaining developer" produced a proof of concept in just two weekends...

    That developer also was rather brusque in the github issue and could use a bit more humility and emotional intelligence. Which, by the way, isn't on the OP blog post's chart of a "programmers lifecycle". The same could be said of the MS side.

    Instead of both sides asserting (or "proving") that they're "right" could they not have collaborated to put together an improvement in Windows Terminal? Wouldn't that have been better for everyone?

    FWIW, I do use windows terminal and it's "fine". Much better than the old one (conhost?).

    • > could they not have collaborated to put together an improvement in Windows Terminal?

      My experience with people that want to collaborate instead of just recognizing and following good advice is that you spend a tremendous amount of effort just to convince them to get their ass moving, then find out they were not capable of solving the problem in the first place, and it’s frankly just not worth it.

      Much more fun to just reimplement the thing and then say “You were saying?”

      3 replies →

    • It’s often easier to put together something from scratch, if you’re trying to prove a point, than it is to fix a fundamentally broken architecture.

That sounds like you've never seen performance of a heavily worked-on subsystem increase by 10x because one guy who was good at that kind of stuff spent a day or two focused on the problem.

I've seen that happen at least 10 times over my career, including at very big companies with very bright people writing the code. I've been that guy. There are always these sorts of opportunities in all but the most heavily optimized codebases, most teams either a) just don't have the right people to find them, or b) have too much other shit that's on fire to let anyone except a newbie look at stuff like performance.

  • More generally, in my experience performance isn't looked at because it's "good enough" from a product point of view.

    "Yes it's kinda slow, but not enough so customers leave so who cares." Performance only becomes a priority when it's so bad customers complain loudly about it and churn because of it.

    • There’s a bit of incentive misalignment when commercial software performance is concerned. If we presume customers with tighter budgets tend to be more vocal and require more support, and customers on slower machines are often customers on tighter budgets, the business as a whole might actually not mind those customers leaving as it’d require less support resources spent on customers who are more difficult to upsell to.

      Meanwhile, the majority of customers with faster machines are not sensitive enough to feel or bother about the avoidable lag.

      1 reply →

    • That is probably why they have this law called wirths law about the wintel ecosystem. What Andy giveth, Bill taketh away.

      Or Gates's law "The speed of software halves every 18 months"

    • There's also sometimes the incentive to slow things down because if it is too fast, the client will perceive that he paid too much money for an operation that takes no time, i.e. it doesn't exists seems unimportant.

      1 reply →

  • I've made such optimizations and others made them in my code, so:

    c) slow code happens to everyone, sometimes you need fresh pair of eyes.

    • Absolutely: what I should have said is that I've been the one to cause performance problems, I've been the one to solve them, I've been the manager who refused to allocate time to them because they were not important enough, and I've been the product owner who made the call to spend eng hours on them because they were. There are many systemic reasons why this stuff does not get fixed and it's not always "they code like crap", though sometimes that is a correct assessment.

      But show me a codebase that doesn't have at least a factor of 2 improvement somewhere and does not serve at least 100 million users (at which point any perf gain is worthwhile), and I'll show you a team that is being mismanaged by someone who cares more about tech than user experience.

    • "Need" especially, because often it's just that those fresh eyes don't have any of the political history, so doesn't have any fallout from cutting swaths through other people's code that would be a problem for someone experienced in the organisation.

  • I’ve seen it at least as many times, too. Most of the time, the optimization is pretty obvious. Adding an index to a query, or using basic dynamic programming techniques, or changing data structures to optimize a loop’s lookups.

    I can’t think of a counter example, actually (where a brutally slow system I was working on wasn’t fairly easily optimized into adequate performance).

    • It is nice when a program can be significantly sped up by a local change like that but this is not always the case.

      To go truly fast, you need to unleash the full potential of the hardware and doing it can require re-architecting the system from the ground up. For example, both postgres and clickhouse can do `select sum(field1) from table group by field2`, but clickhouse will be 100x faster and no amount of microoptimizations in postgres will change that.

      1 reply →

    • Yah, was going to say something like this. I've fixed these problems a few times, and I don't really think any of them were particularly hard. That is because if your the first person to look at something with an eye to performance there is almost always some low hanging fruit that will gain a lot of perf. Being the 10th person to look at it, or attack something that is widely viewed as algorithmically at its limit OTOH is a different problem.

      I'm not even sure it takes an "experienced" engineer, one of my first linux patches was simply to remove a goto, which dropped an exponential factor from something, and changed it from "slow" to imperceptible.

I don't know about you, but I was really laughing out loud reading that GitHub conversation.

GPUs: able to render millions of triangles with complex geometrical transformations and non-trivial per-pixel programs in real time

MS engineers: drawing colored text is SLOW, what do you expect

P.S. And yes, I know, text rendering is a non-trivial problem. But it is a largely solved problem. We have text editors that can render huge files with real-time syntax highlighting, browsers that can quickly layout much more complex text, and, obviously Linux and Mac terminal emulators that somehow have no issue whatsoever rendering large amount of colored text.

  • To be fair to the MS engineers, from their background experience with things like DirectText, they would have an ingrained rule-of-thumb that text is slow.

    That's because it is slow in the most general case: If you have to support arbitrary effects, transformations, ligatures, subpixel hinting, and smooth animations simultaneously, there's no quick and simple approach.

    The Windows Terminal is a special case that doesn't have all of those features: No animation and no arbitrary transforms dramatically simplifies things. Having a constant font size helps a lot with caching. The regularity of the fixed-width font grid placement eliminates kerning and any code path that deals with subpixel level hinting or alignment. Etc...

    It really is a simple problem: it's a grid of rectangles with a little sprite on each one.

    • Supercomputer or not, it's a terminal.

      In the real-CRT-terminal days of the 1970's & 1980's of course the interface to the local or remote mainframe or PC was IO-bound but not much else could slow it down.

      UI elements like the keyboard/screen combo have been expected to perform at the speed of light for decades using only simple hardware to begin with.

      The UX of a modern terminal app would best be not much different than a real CRT unit unless the traditional keyboard/display UI could actually be improved in some way.

      Even adding a "mouse" didn't slow down the Atari 400 (which was only an 8-bit personal computer) when I programmed it to use the gaming trackball to point & click plus drag & drop. That was using regular Atari Basic, no assembly code. And I'm no software engineer.

      A decade later once the mouse had been recognized and brought into the mainstream it didn't seem to slow down DOS at all, compared to a rodent-free environment.

      Using modern electronics surely there should not be any perceptable lag compared to non-intelligent CRT's over dial-up.

      Unless maybe the engineers are not as advanced as they used to be decades ago.

      Or maybe the management/approach is faulty, all it takes is one non-leader in a leadership position to negate the abilities of all talented operators working under that sub-hierarchy.

    • Exactly. Fast terminal rendering on bitmap displays has been a solved problem for at least 35+ years. Lower resolutions, sure, but also magnitudes slower hardware.

  • It's more subtle than that. What the Microsoft engineers are saying is that the console's current approach to drawing text is inherently slow in this particular case, due to the way the text drawing library it's based on uses the GPU. The proposed solution requires the terminal to have its own text drawing code specific to the task of rendering a terminal, including handling all the nasty subtlties and edge-cases of Unicode, which must be maintained forever. This is not trivial at all; every piece of code ever written to handle this seems to end up having endless subtle bugs involving weird edge-cases (remember all those stories about character strings that crash iPhones and other devices - and the open source equivalents are no better). It's relatively easy to write one that seems to work for the cases that happen to be tested by the developer, but that's only a tiny part of the work.

    • I’m fairly sure noone in question wrote a new font renderer, but just rendered all available fonts upfront with a system library, and uploaded it to the GPU and let it use it as a bitmap.

      Text rendering is still done mostly on the CPU side in the great majority of applications, since vector graphics are hard to do efficiently on GPUs.

  • Simply shaping text using state of the art libraries (like harfbuzz) can take an INCREDIBLE amount of time in some cases. If you're used to rendering text in western character sets you may think it can always be fast, but there are cases where it's actually quite slow! You get a sense for this if you try to write something like a web browser or a word processor and have to support people other than github posters.

    Of course in this case it seems like it was possible to make it very fast, but people who think proper text rendering is naturally going to be somewhat slow aren't always wrong.

    Saying that text rendering is "largely solved" is also incorrect. There are still changes and improvements being made to the state of the art and there are still unhappy users who don't get good text rendering and layout in their favorite applications when using a language other than English.

    • You are right in the general case. But terminals are a specific niche, not requiring the full extent of text rendering edge cases as a browser, wysiwyg editor, etc “experience”. It renders “strictly”* monospaced fonts, which makes it trivial to cache and parallelize.

      * as it was brought up, one might use a non-monospace font, but that case can just use the slow path and let the “normal” people use a fast terminal

I understand the scepticism about such claims, but Casey's renderer is not a toy, and handles a number of quite dificult test-cases correctly. He solicited feedback from a sizeable community to try and break his implementation. The code is vailable here: https://github.com/cmuratori/refterm

From the refterm README:

refterm is designed to support several features, just to ensure that no shortcuts have been taken in the design of the renderer. As such, refterm supports:

* Multicolor fonts

* All of Unicode, including combining characters and right-to-left text like Arabic Glyphs that can take up several cells

* Line wrapping

* Reflowing line wrapping on terminal resize

* Large scrollback buffer

* VT codes for setting colors and cursor positions, as well as strikethrough, underline, blink, reverse video, etc.

  • The really hard part of writing a terminal emulator, at least from my experience working on Alacritty, is fast scrolling with fixed regions (think vim).

    Plently of other parts of terminal emulators are tricky to implement performantly, ligatures are one Alacritty hasn't got yet.

    • Thanks for the insight.

      I have never written a terminal enulator, so could you maybe summarize why fast scrolling with fixed regions is so hard to implement?

Reading the thread itself, it’s a bit of both. Windows Terminal is complex, ClearType is complex and Unicode rendering is complex. That said… https://github.com/cmuratori/refterm does exist, does not support ClearType, but does claim to fully support Unicode. Unfortunately, Microsoft can’t use the code because (a) it’s GPLv2 and (b) it sounds like the Windows Terminal project is indeed a bit more complicated than can be hacked on over a weekend and would need extensive refactoring to support the approach. So it sounds a bit more like a brownfield problem than simply ignoring half the things it needs to do, though it probably does that too.

  • > Unfortunately, Microsoft can’t use the code

    As good as Casey Muratori is, Microsoft is more than big enough to have the means of taking his core ideas and implement them themselves. It may not take them a couple weekends, but they should be able to spend a couple experienced man-months over this.

    The fact they don't can only mean they don't care. Maybe the people at Microsoft care, but clearly the organisation as a whole as other priorities.

    Besides, this is not the first time I've seen Casey complain about performance in a Microsoft product. Last time it was about boot times for Visual Studio, which he does to debug code. While reporting performance problems was possible, the form only had "less than 10s" as the shortest boot time you could tick. Clearly, they considered that if VS booted in 9 seconds or less, you don't have a performance problem at all.

  • > Unfortunately, Microsoft can’t use the code

    I commented on a separate issue re: refterm

    --- start quote ---

    Something tells me that the half-a-dozen to a dozen of Microsoft developers working on Windows terminal:

    - could go ahead an do the same "doctoral research" that Casey Muratori did and retrace his steps

    - could pool together their not insignificant salaries and hire Casey as a consultant

    - ask their managers and let Microsoft spend some of those 15.5 billion dollars of net income on hiring someone like Casey who knows what they are doing

    --- end quote ---

  • > Unfortunately, Microsoft can’t use the code because (a) it’s GPLv2

    One thing to remember is that it is always possible and acceptable to contact the author of a GPL-licensed piece of code to enquire whether they would consider granting you a commercial license.

    It may not be worthwhile but if you find exactly what you're looking for and that would take you months to develop yourself then it may very well be.

    • Not always. GPL-licensed do not have to have a “the author”. There may be hundreds of copyright holders involved (IIRC, ¿Netscape? spent years looking for people that had to agree when it planned to change their license and rewriting parts written by people who didn’t)

      6 replies →

  • > (a) it’s GPLv2

    Why is that a problem? A GPLv2 terminal would not be a business problem for Microsoft. People would still have to buy licenses for Windows. Maybe they would lose a little face, but arguably they have already done so.

    At least it’s not GPLv3 which this industry absolutely and viscerally hates (despite having no problem Apache 2.0 for some reason, Theo de Raadt is at least consistent).

    • If Microsoft embedded the GPLv2 terminal into Windows, Windows would have to release as GPLv2 (or compatible license). I assume they don't want that.

      They can alternatively buy a commercial license, as another user said below.

      2 replies →

  • > Unfortunately, Microsoft can’t use the code because (a) it’s GPLv2

    That's not unfortunate. Having people who work on competing Free Software is a good thing. It would be even better if Microsoft adopted this code and complied with the terms of the GPL license. Then we won't have to deal with problems like these because they'd be nipped in the bud. And we would set the precedent to take care of lot of other problems like the malware, telemetry, abuse of users' freedoms.

It's the hardest thing about building perf-related PoCs. Every time I've built a prototype to prove out an optimization, I've spent the entire duration of the project watching the benefit shrink, stressing that it would dwindle to nothing by the end. So far, I've been lucky and careful enough that I haven't convinced a team to expend massive resources on a benefit that turned out to be fictional, but I've had it happen enough times at the scale of a p-day or two that it always worries me.

Counterexample: WireGuard. Turns out OpenVPN was massive and slow for no reason and it only took one (talented and motivated) man to make a much better version.

> > for an experienced programmer a terminal renderer is a fun weekend project and far away from being a multiyear long research undertaking.

> You shouldn't automatically assume something is actually bad just because someone shows a [vastly] better proof-of-concept 'alternative'.

Apparently you should. I can confirm that the first quote is a appropriate assessment of the difficulty of writing terminal renderer. Citation: I did pretty much exactly the same thing for pretty much exactly the same reasons when (IIRC gnome-)terminal was incapable of handing 80*24*60 = 115200 esc-[-m sequences per second, and am still using the resulting terminal emulator as a daily driver years later.

>> I have no idea if this is the case here, and I suspect it might not be, but pretty much every time I've seen a developer complain that something is slow and then 'prove' that it can be faster by making a proof-of-concept the only reason theirs is faster is because it doesn't implement the important-but-slow bits and it ignores most of the edge cases.

Even in those cases it usually turns out that the handling of edge cases was considered reason enough to sacrifice performance rather than finding a better solution to the edge case. Handling edge cases probably should not cost 10x average performance.

This seems referenced in the repo itself, see the “feature support” section [1].

That being said, is anyone aware of a significant missing feature that would impact performance?

[1]: https://github.com/cmuratori/refterm#feature-support

  • Screen reader support[0] may have a noticeable performance cost.

    [0] https://github.com/microsoft/terminal/issues/10528#issuecomm...

    • Can you explain how screen reader support could possibly have a noticeable performance cost?

      The screen reader code should be doing absolutely nothing if it's not enabled - and even if it is, I can't imagine how it could affect performance anyway. For plain text, such as a terminal, all it does is grab text and parse into words (and then the part where it reads the words, but that's separate from the terminal) - I don't see how this is any more difficult than just taking your terminal's array of cell structs, pulling out the characters into a dynamic array, and returning a pointer to that.

Not necessarily, often times, especially in big corporations programmers will be incentivized to deliver things quickly, rather than to provide the optimal solution. Not because they are bad at programming, but because they have quotas and deadlines to meet. Just remember that story of how in the first Excel version a dev had hard-coded some of the cell dimension calculations as they were under pressure to close as much tasks as fast as possible.

The one example that comes to mind is file system search.

I am writing this application that displays the file system in the browser in a GUI much like Windows Explorer or OSX Finder. It performs file system search substantially faster than Windows Explorer. Windows Explorer is written in a lower level language with decades of usage and experience where my application is a one man hobby project written in JavaScript (TypeScript).

The reason why the hobby project is so substantially faster than a piece of core technology of the flagship product of Microsoft is that it does less.

First, you have to understand how recursive tree models work. You have a general idea of how to access nodes on the tree, but you have no idea what’s there until you are close enough to touch it. File system access performance is limited by both the hardware on which the file system resides and the logic on the particular file system type. Those constraints erode away some of the performance benefits of using a lower level language. What ever operations you wish to perform must be individually applied on each node because you have no idea what’s there until you are touching it.

Second, because the operations are individually applied on each node it’s important to limit what those operations actually are. My application is only searching against a string fragment, absence of the string fragment, or a regular expression match. Wildcards are not supported and other extended search syntax is not supported. If you have to parse a rule each time before applying it to a string identifier of a node those are additional operations performed at each and every node in the designated segment of the tree.

For those familiar with the DOM in the browser it also has the same problems because it’s also a tree model. This is why querySelectors are so incredibly slow compared to walking the DOM with the boring old static DOM methods.

> pretty much every time I've seen a developer complain that something is slow and then 'prove' that it can be faster by making a proof-of-concept the only reason theirs is faster is because it doesn't implement the important-but-slow bits and it ignores most of the edge cases

It's still a good place to start a discussion though. In such a case, apparently someone believes strongly that things can be made much faster, and now you can either learn from that person or explain to them what edge cases they are missing.

This.

My time library is so much faster and smaller than yours. Timezones? Nah, didn't implement it.

My font rendering is so much simpler and faster than yours. Nah, only 8 bit encodings. Also no RTL. Ligatures? Come on.

The list goes on.