Comment by tannhaeuser

1 month ago

Guess what, you're not required to open <html>, <head>, or <body> either. It all follows from SGML tag inference rules, and the rules aren't that difficult to understand. What makes them appear magical is WHATWG's verbose ad-hoc parsing algorithm presentation explicitly listing eg. elements that close their parents originally captured from SGML but having become unmaintained as new elements were added. This already started to happen in the very first revision after Ian Hickson's initial procedural HTML parsing description ([1]).

I'd also wish people would stop calling every element-specific behavior HTML parsers do "liberal and tag-soup"-like. Yes WHATWG HTML does define error recovery rules, and HTML had introduced historic blunders to accomodate inline CSS and inline JS, but almost always what's being complained about are just SGML empty elements (aka HTML void elements) or tag omission (as described above) by folks not doing their homework.

[1]: https://sgmljs.sgml.net/docs/html5.html#tag-omission (see also XML Prague 2017 proceedings pp. 101ff)

HTML becomes pretty delightful for prototyping when you embrace this. You can open up an empy file and start typing tags with zero boilerplate. Drop in a script tag and forget about getElementById(); every id attribute already defines a JavaScript variable name directly, so go to town. Today the specs guarantee consistent behavior so this doesn't introduce compatiblity issues like it did in the bad old days of IE6. You can make surprisingly powerful stuff in a single file application with no fluff.

I just wish browsers weren't so anal about making you load things from http://localhost instead of file:// directly. Someone ought to look into fixing the security issues of file:// URLs so browsers can relax about that.

  • Welcome, kids, to how all web development was done 25-30 years ago. You typed up html, threw in some scripts (once JavaScript became a thing) and off you went. No CMS, no frameworks. I know a guy who wrote a fully functional client-side banking back office app in IE4 JS by posting into different frames and observing the DOM returned by the server. In 1999. Worked a treat on network speeds and workstation capabilities you literally can’t imagine today.

    Things do not have to be complicated. That abstraction layer you are adding sure is elegant, but is it also necessary? Does it add more value than it consumes not just at the time of coding but throughout the entire lifecycle of the system? People have piled abstraction on top of hardware from day one, but one has to ask, if and when did we get past the point of diminishing returns? Kubernetes was supposed to be the thing that makes managing vms simple. Now there are things supposedly making managing Kubernetes simple. Maybe, just maybe, this computer-stuff is inherently complicated and we’re just adding to it by hoping all of it can eventually be made “simple”? Just look at the messages around vibe coding…

    • yeh, the good old (tm) days :-))

      Today you first need AI to figure ot what is the JS-framework-of-the-week and then you need AI to generate all the boiler plate code and then you use AI to debug all the stuff you created :-)

  • A workaround for the file:// security deny is to use a JavaScript file for data (initialized array) rather than something more natural like JSON.

    Apparently JavaScript got grandfathered in as ok for direct access!

    • once i had to import some xml and just put it in a hidden div since html allows any tag names XD

  • Wow, I had never heard of that ID -> variable feature

    • Yeah it was hard to believe when I first learned about it, but it's true. I think I first found out when I forgot to put in a getElementById call and my code still worked.

    • More specifically it becomes a property of window, which is the global object.

      So <div id="hello"> becomes accessible as window["hello"], which means you can just directly write hello.innerText = "Hi!".

      Since this may conflicts with any of the hundreds of other properties on window, it's generally not something that should be used.

      Historically it wasn't too uncommon to see it, but since it doesn't work well with typescript, it's very rare now.

      1 reply →

    • Also window.document.forms gets you direct access to all forms, "name" automatically attach an attribute to the parents and "this" rebind to the current element on inline event handler.

      The DOM API may have been very messy at creation, but it is also very handy and powerful, especially for binding to a live programming visual environment with instant remote update capabilities.

      2 replies →

    • It's been there since the beginning but it has several exceptions, like it's not available in strict mode and modules. Ask your ChatGPT if implied globals are right for you.

  • > Someone ought to look into fixing the security issues of file:// URLs

    If you mean full sandboxing of applications with a usable capability system, then yeah, someone ought to do that. But I wouldn't hold my breath, there's a reason why nobody did yet.

  • Yes i love quickly creating tools in a single file, if the tool gets really complex I'll switch to a sveltekit Static site. I have a default css file I use for all of them to make it even quicker and not look so much like AI slop.

    I think every dev should have a tools.TheirDomain.zzz where they put different tools they create. You can make so many static tools and I feel like everyone creates these from time to time when they are prototyping things. There's so many free options for static hosting and you can write bash deploy scripts so quickly with AI, so its literally just ./deploy.sh to deploy. (I also recommend writing some reusable logic for saving to local storage/indexedDB so its even nicer.)

    Mine for example is https://tools.carsho.dev (100% offline/static tools, no monetization)

This is what I complain about:

https://nvd.nist.gov/vuln/detail/CVE-2020-26870

https://sirre.al/2025/08/06/safe-json-in-script-tags-how-not...

https://bughunters.google.com/blog/5038742869770240/escaping...

None of those problems exist in XHTML.

  • I guess you're replying to my comment because you were triggered by my last sentence. I wasn't criticizing you specifically, but yeah, in another comment you're writing

    > It probably didn't help that XHTML did not offer any new features over tag-soup HTML syntax.

    which unfortunately reaks of exactly the kind of roundabout HTML criticism that is not so helpful IMO. We have to face the possibility that most HTML documents have already been written at this point, at least if you value text by humans.

    The CVEs you're referencing are due to said historic blunders allowing inline JS or otherwise tunneling foreign syntax in markup constructs (mutation XSSs are only triggered by serialising and reparsing HTML as part of bogus sanitizer libs anyway).

    If you look at past comments of mine, you'll notice I'm staunchly criticizing inline JS and CSS (should always be placed in external "resources") and go as far as saying CSS or other ad-hoc item-value syntax should not even exist when attributes already serve this purpose.

    The remaining CVE is made possible by Hickson's overly liberal rules for what's allowed or needs escaping in attributes vs SGML's much stricter rules.

    • Inline JS or CSS is fine if typed directly by humans. It's only a problem when generated. Generated resources should always be in separate files.

      I like the flexibility of being able to make one file HTML apps with inline resources when I'm not generating code. But there should be better protections against including inline scripts in generated code unintentionally.

Omitting <body> can lead to weird surprises. I once had some JavaScript mysteriously breaking because document.body was null during inline execution.

Since then I always write <body> explicitly even though it is optional.