The Website Obesity Crisis (2015)

2 years ago (idlewords.com)

It's not just web pages. It's easier to build bloated things in less time than non-bloated things. For-profit companies (rightly, I'd argue) don't value the Craft as much as the Bottom Line.

My home page, though admitted sparse and functional, is a total 39 KB download, including 9 link icons and a blurred full background image. (Turns out blurring the image allows you some incredible levels of JPEG compression with few artifacts.) I worked a bit to get it there. And that's nothing compared with what some demoscene folks can do. :)

The other day in one of my "get off my lawn" moments, I declared to the Rails app I was messing with that, "Every web framework sucks!" And then quickly amended the sentiment to include every test framework and build framework, for good measure.

I mean, they have their place for rapid development, of course. But the accompanying bloat and dependency surface hurts my soul.

  • What’s your homepage URL, tho? (You don’t give it, and can’t find it on your HN profile.)

    FTR, here is a complete Solitaire game in ~30 kB over-the-wire: https://FreeSolitaire.win/

    Last time I checked, an update for Microsoft Solitaire was 30 megabytes…

    • https://beej.us/ But, like I said, not super impressive.

      I still maintain though if folks spent the time on it, they could get same the effect they were looking for with 10% the bandwidth.

    • Microsoft used to deliver Solitaire as a little toy for computers that had 20Mb harddisks.

As the article's author, I like to rant about articles ranting about bloat being bloated themselves, but here's an opportunity to rant about an article ranting about rants about bloat being bloated, being bloated itself! The page has 1 MB of rather unnecessary illustrations (many of which are just amusing pictures), which is larger than most of the books (as it compares other pages to books).

It is surprisingly common for content to contradict presentation in such articles. Well, it is mentioned in the article already, but still strange how common it seems to be.

Edit: though this is not exactly an article, but rather a presentation's/talk's "text" version, and the illustrations are slides. So it probably wasn't meant to look/be quite like that.

I wrote this seven years ago and haven't studied the issue much in the interim. Can people who are up to speed on modern web design comment on what's changed (for the better or worse) in the interim? Do the tools and frameworks still change every few months, or have things settled down a bit?

  • React is now the default choice. Google made everyone aware of page speed with their Core Web Vitals — speed/UX metrics affecting SEO. There’s noise about tools and practices to reduce page weight with JS-heavy sites, but in practice it still goes up linearly: https://httparchive.org/reports/page-weight. Your piece is still relevant.

  • There is a recent resurgence of backend-driven website interactivity micro-frameworks like Phoenix LiveView, Rails Stimulus/Hotwire, and most recently (and most generic) htmx. These all advocate for writing minimal (sometime zero) custom JavaScript and driving interactivity through use of partial HTML fragments returned by the backend. They also allow for ultra-simple tooling setup like just throwing a <script> tag into the page instead of having to set up an npm project.

    htmx btw is about 12 KB gzipped and enables a surprising amount of interactivity on pages.

How exactly is it that webpages with a couple MB of JavaScript or CSS or whatever so reliably cause our CPUs and fans to go nuts when much larger and more complex programs on our machines don't?

I just don't have a good intuition for what's happening here. In these discussions, people always talk like a webpage with 1 MB of JavaScript is a monstrosity, which, yeah, makes sense from an absolute perspective; it takes a lot of lines of source code to fill a text file up to 1 MB. But from a relative perspective, I have a bunch of programs on my machine that take up hundreds of MB of storage, and some do heavy scientific computations, but my laptop fan isn't pegged out most of the time, until I visit a page on reddit.com (so now I always make sure to use old.reddit.com instead).

I have a graduate-student understanding of computer systems, but again I just don't have a strong intuition for what's happening here. Can someone explain?

  • I mean, there's a difference between the size of a codebase and its efficiency. I could write a ten line program and peg a CPU, and write a ten thousand line program that doesn't cause and trouble at all.

    • Sure, I get that. But is the code behind something like reddit.com really that poorly written? Like a buggy, half-completed homework assignment that I turned in as a freshman CS student? If so, that seems to be an entirely different problem from the one discussed here.

I almost never have my phone connected to cellular data. I use a cheap pre-paid plan and have kept the same 2 gigs for over 2 years. The first time I used cellular data was to check in to a clinic (January 2021). Just checking in at their website used somewhere between 3 and 6 megabytes of data. Something I could have theoretically done, faster, with a single SMS to an email address or phone number.

> Here’s a self-righteous blogger who likes to criticize others for having bloated websites. And yet there's a gratuitous 3 megabyte image at the top of his most recent post.

Heh ;)

Funnily enough, he fixed it, but now has an almost 200kb PNG in the header: https://idlewords.com/images/toothfish.png

At least his server seems to be so slow or HN-hugged, that I could actually see the images slowly loading.

Need a browser plugin that tells you if the page you are viewing is above or below the current average page size in use on the Internet. Then stigmatise those who are over average size and somehow reward those who are below it.

Maciej definitely pushed back the obesity on Pinboard even more since his 2015 talks- now, most features doesn't work anymore:) /s (almost).

Anyhow: an old classic and very good article. Miss (2015) in the title.

This is a great deck, very entertaining to read and I like the commentary a lot. On the topic called 'Heavy Clouds' I think about this a lot: I wonder how many new CS grads, bootcamp graduates, or otherwise new web devs have been instructed that 'the' way to spin up a website is on a 'Heavy Cloud', thus normalizing all of the so-called 'web scale' autoscaling and whatnot; learning HOW to engage with 'Heavy Clouds' but never really understanding when or why they should do it.

I worked on bbc.co.uk back in the early days (late 90s), and the homepage of each site had to be under 70k, including images, or the operations people wouldn't make it live. I think we can afford a bit more now, but I miss that discipline. When building sites now I'm still focused on page weight - only vanillla JS, hand-coded HTML - but given what clients request in terms of design I'm lucky if I can get homepage weight below 500k (desktop) these days.

I can remember the last time I was really, truly excited about a new technology: Excel 2007 was upping the row limit from 65k to just over a million! I would be able to unleash such monsters on the world, I was surely living in the world of big data!

I am typing this comment on a machine far and away more powerful than the one I had back then, I'm not doing anything much differently to what I was doing back then and don't notice performance improvements at all... almost certainly the internet is slower, but I think desktop performance is starting to wane as well.

A lot of what is written in the article holds true, despite its age. For example, let's take the tweet as an example:

> If you open that tweet in a browser, you'll see the page is 900 KB big.

If you look at how big the page is now you'll get something along the lines of this: https://gtmetrix.com/reports/twitter.com/aJKMCxUq/

  2.00 MB (7.56 MB uncompressed)
  1.68 MB of JavaScript
  0.18 MB of fonts
  0.08 MB of HTML
  0.04 MB of images
  ... (some other requests)
  (185 total page requests)

On other sites you see similar amounts of data, though sometimes more custom fonts, or images, or other media.

I used to think that this was because the developers just don't care, or that designers and product managers go wild (which I still do, admittedly), but in the recent years I've felt more and more that this is because the way browsers work is flawed. Nobody seems to complain (much) that browser X install size is Y MB, yet for whatever reason each site is treated as its unique universe where you often re-download what is mostly the same thing over and over, the wasted bandwidth accumulating over time.

Here's a slightly crazy thought experiment:

  - what if all of the frameworks/libraries decoupled the framework/library code from the application code (e.g. React, Angular, Vue), essentially you'd have react-version-X.js and my-site-com-app.js
  - maybe even do this for the most popular libraries and component frameworks, like react-primevue-version-X.js
  - what if browsers shipped the most popular framework/library code versions in them, so those would never need to be downloaded from the visited sites, but would be available locally already
  - what if browsers did the same for most of the popular freely available fonts, too - enough to placate the designers, say, the top 100 of the most popular fonts under each category (serif, sans serif, monospace, ...)
  - everything else can be downloaded the old way, or maybe browsers can provide a package manager of sorts (e.g. site requested Alpine version X, this will be downloaded and re-used for other sites)
  - maybe mandate that only X updates per year are allowed per framework/library or other resource type, to fight off bloat due to trigger happy teams who want to release often

A little bit like a CDN, except that baked into the browser (or selectable as an install option). Of course, this will never happen, because it would require a lot of work on the part of the browsers, would create an "in crowd" of supported resources with everything else having lower chances of becoming as popular, nor could people ever agree upon what are the most popular resources out there to pre-install. But my argument is that we're stuck with the same popular Google Fonts in most sites anyways, as well as a few large JavaScript frameworks that everyone uses regardless, so a lot of the current bloat makes little sense anyways.

Then again, one can feasibly also imagine a world where the user is given a choice to only view downscaled images on all the sites that they visit (which, IIRC, was the case with the Opera Mobile browsers a while back) if they want, or a bunch of other simple to use options, like never downloading custom (non-icon) fonts, should they choose to. But that's not quite the world that we live in - especially in regards to JavaScript, which you often cannot disable and hope that the site will keep working, because it won't.

Worse yet, optimizations that are cool and useful, like how Google Fonts splits up fonts into multiple files based on the character sets, so only the ones needed actually are downloaded, are hard to pull off yourself for arbitrary fonts and so on, for example: https://fonts.googleapis.com/css2?family=Open+Sans&display=s... This last bit is also why my own site is unreasonably bloated (aside from the fact that I chose a non-web-safe font in the first place, to look more like the fancy sites): https://gtmetrix.com/reports/kronis.dev/l0nIApXL/

  • If you take your plan (sans the final point) and replace "shipped with the browser download" with "only downloaded once the first time it is needed" then you basically describe how pre-webpack websites all worked. Everyone was putting the CDN url for jquery in their html head and browsers were reading from their local cache for every site after the 1st.

    I think it's still perfectly possible to do that with React but everyone wants to write JSX and transpile it instead of doing this procedurally so it gets webpacked anyway. Caveat, I haven't touched React in about 4 years so I don't know if that's still true.

    Your final point about disallowing more than X updates per year is unworkable due to the unpredictability of security patches.