Comment by mike_hearn

7 hours ago

1024x768 on Windows 3.1? Impossible, I suspect. Highest res you'd get in that era was 800x600 if luxurious but that might have been Windows 95+, 640x480 for most people. These numbers are still burned into my brain.

I just took some heap snapshots of this HN discussion page. Each JS heap snapshot was about 25MB and that does not include Blink native state or GPU bitmaps. The tab itself is taking ~100MB.

This isn't rendered bitmaps. Those get passed to the GPU process and uploaded to VRAM. That is just in-memory state required to render a very simple textual page with no images on it.

Where does it all go? Looking at the 25MB, the top memory user is anchor elements, 0.75MB is just the memory needed to represent links. This is where we start to get a clue as to the problem. An HN comments thread has a lot of links next to each comment, but they're very repetitive.

How would one implement this in a classical desktop toolkit? When we compare, we can see where the memory goes.

In the desktop era you would have implemented this with virtual scrolling in a customized list view - because it's easy and because the culture is to always do that. The APIs, sample code etc always pushes you in this direction. Additionally, there's much greater use of custom widgets that draw themselves on demand. Each comment would be a simple in-memory data structure, with zero RAM spent on UI. Each entry in the view would be an item in a listbox and Windows would request a paint of the entry as it scrolled into view (as comments don't stack horizontally). You'd have implemented a custom widget to render the text and the clickable links. The per-comment links (parent | context etc) would have been measured up front and then drawn with a single GDI DrawText. Mouse clicks would have been mapped back to the underlying text by using the measurements and then dispatched. The text would have been stored in a compact, typed, reflection-free in-memory data structure that stored only the comment, the UNIX time and an integer user id whose name would be looked up in a hashmap.

That approach is extremely memory efficient. The program itself translates business-level data structures to drawing calls just-in-time, and maps mouse clicks back to business-level operations with custom code. The OS helps out by supplying clipping, text rendering and pre-canned widgets, but devs have access to all the APIs from low to high level. However, it has downsides too. There is not necessarily any accessibility tree for screen readers, and things like changing the font size or selecting comment text are treated as nice-to-haves.

In a later era, maybe Swing or JavaFX or Qt, you'd no longer measure and draw the text yourself or map mouse clicks anymore. Data structures in RAM are less efficient now and GC lets garbage hang around. But the list view would still be virtualized by default. In virtualized list view, as the user scrolls it slides widgets off the top and then immediately repositions them at the bottom, so it's a bit like an escalator. As the list item is scrolled into view it's requested to mutate its internal state to match an entry in an underlying list of business objects. In web terms, the div is moved downwards in an absolutely positioned sense and then the text properties of the sub-divs are adjusted. As a result there are only a few UI objects being managed by the toolkit at once. The engine isn't expected to keep thousands of widgets off screen.

The web doesn't support this stuff well because it was never meant to be a GUI toolkit, so the path of least resistance is to:

• Generate one set of unique DOM elements for every business object.

• Use strings for everything. JS isn't strongly typed, browsers don't support any binary data transfer formats, so you can't store user IDs as small integers and do just-in-time hashmap lookups to render the usernames.

• Duplicate everything incessantly. Every comment has a bunch of anchors and the data for all of them is nearly identical (href, id, CSS class), but the web doesn't make it easy to deduplicate this. You could do it in theory but it's not trivial and not how anyone is taught to write apps.

To fix this the web would need to support what older toolkits called owner draw controls, but I think they're reluctant to do this sort of thing because the telos of the web is to treat web devs like children - they aren't trusted with low level features because they might misuse them or get them wrong, or because it's hard to implement and browser devs don't want to support it. The web's approach to multi-threading or native code is a good example of this mentality in action. Owner draw controls and virtualized list views expect more of devs, and the HTML designers don't want to expect more, they want to expect less. So we get web pages that take gigabytes of RAM.