Comment by jodrellblank

2 days ago

It's well known, but this video[1] is a proof of concept demonstration from 4 years ago, Casey Muratori called out Microsoft's new Windows Terminal for slow performance and people argued that it wasn't possible, practical, or maintainable to make a faster terminal and that his claims of "thousands of frames per second" were hyperbolic, and one person said it would be a "PHD level research project".

In response, Casey spent <1 week making RefTerm, a skeleton proto-terminal with the same constraints Microsoft people had - using Windows APIs for things, using DirectDraw with GPU rendering, handling terminal escape codes, colours, blinking, custom fonts, missing font character fallback, line wrap, scrollback, Unicode and Right-to-Left Arabic combining characters, etc. RefTerm had 10x faster throughput than Windows Terminal and ran at 6-7000 frames per second. It was single-threaded, not profiled, not tuned, no advanced algorithms, no-cheating by sending some data to /dev/null, all it had to speed it up was simple code without tons of abstractions and a Least Recently Used (LRU) glyph cache to avoid re-rendering common characters, written the first way that he thought of. Around that time he did a video series on that YouTube channel about optimization and arguing that even talking about 'optimization' was too hopeful, we should be talking about 'not-pessimization', that most software is not slow because it has unavoidable complexity and abstractions needed to help maintenance, it's slow because it's choked by a big pile of do-nothing code and abstraction layers added for ideological reasons which hurt maintenance as well as performance.

[1] https://www.youtube.com/watch?v=hxM8QmyZXtg - "How fast should an unoptimized terminal run?"

This video[2] is another specific details one, Jason Booth talking about his experience of game development, and practical examples of changing data layout and C++ code to make it do less work, be more cache friendly, have better memory access patterns, and run orders of magnitude faster without adding much complexity and sometimes removing complexity.

[2] https://www.youtube.com/watch?v=NAVbI1HIzCE - "Practical Optimizations"

6 comments

jodrellblank

sgarland 2 days ago

I simultaneously love and hate watching Casey Muratori. Love because he routinely does things like this, hate because I have conversations like this entirely too often at work, except no one cares.

jodrellblank 2 days ago

Someone posted their word game Cobble[1] on HN recently, the game gives some letters and the challenge is to find two English words which together use up all the given letters, and the combined two words to be as short as possible.

A naive brute-force solver takes the Cobble wordlist of 64k words and compares every word against every other word and does 64k x 64k = 4Bn loops and in the inner loop body, loops over the combined characters. If the combined words average 10 characters long, that's 40 billion operations just for the code structure, plus character testing and counting and data structures to store the counts. Seconds or Minutes of work for a puzzle that feels like any modern computer should solve it in microseconds.

It's always mildly intresting to me how a simple to explain problem, a tiny amount of data, and four lines of nested loop, can generate enough work to choke a modern CPU for minutes. Then considering how much work 3D games do in milliseconds. It highlights how impressive algorithmic research of the 1960s was to find ways to get early computers to do anything in a reasonable time, let alone find fast paths through complex problem patterns. Or perhaps, of all the zillions of possible problems which could exist, find any which can be approached by human minds and computers.

[1] https://news.ycombinator.com/item?id=44588699

NohatCoder 2 days ago
Of course finding the optimal solution to a Cobble puzzle does not actually require the computation you describe. We can in a single pass find a limited set of candidate words and work out a solution with those.
- jodrellblank 2 days ago
  
  Sure; after Casey Muratori saying that people argue with him that no normal developer needs to worry about performance, computers are fast enough, performance is a niche concern, I'm just musing how little data it takes - 64k is nothing to a modern person - and how abruptly anyone who wants a fast answer has to switch to think about performance, pre-processing the list, sorting more promising candidates first, using a faster language, noticing that it's embarrassingly parallel, etc.

phtrivier 2 days ago

I would have loved to live in a universe where we could replace the Windows Terminal with RefTerm - if only, to measure how many hours would pass before a Fortune 500 company has to halt operations, because RefTerm does not properly re- implement one of the subtle bugs creeping from one of the bazillion features that had made WinTerm slow over the years. [1]

[1] https://xkcd.com/1172/

jodrellblank 2 days ago

I sighed when I read your comment, a comment which is exemplary of what Casey Muratori was ranting against - casual lazy dismissal of the idea that software can be faster, based on misunderstanding and lack of knowledge and/or interest, and throwing out the first objection that comes to mind as if it's an impassable obstacle. There were no bazillion features that made WinTerm slow over the years because Windows Terminal was a new product for Windows 10, released in 2019.[1]. There were piles of problems in Windows Terminal, Casey calls out that it didn't render Right-to-Left Arabic combining glyphs and it wasn't a perfect highly polished program from the outset. And it was an optional download, Fortune 500s wouldn't run it if they didn't want to.
RefTerm was explicitly not a production quality terminal and was not intended to be a replacement for Windows Terminal. RefTerm was a lower bound for performance of an untuned single-thread terminal. RefTerm was a proof of concept that if Microsoft had spent money and engineering skill on performance they could have profiled and used fancy algorithms and shortcuts and reimplemented slow Windows APIs with faster local ones, used threading, and improved on RefTerm's performance. A proof that "significantly faster terminals are unrealistic" is not true, that all the casual dismissals of why it's impossible are not the reasons for slowness, and that 10x better is an easily achievable floor, not a distant unreachable ceiling.
As a result of Casey's public shaming, Windows Terminal developers did improve performance.
[1] https://en.wikipedia.org/wiki/Windows_Terminal