Comment by tannhaeuser

11 hours ago

Guess what, you're not required to open <html>, <head>, or <body> either. It all follows from SGML tag inference rules, and the rules aren't that difficult to understand. What makes them appear magical is WHATWG's verbose ad-hoc parsing algorithm presentation explicitly listing eg. elements that close their parents originally captured from SGML but having become unmaintained as new elements were added. This already started to happen in the very first revision after Ian Hickson's initial procedural HTML parsing description ([1]).

I'd also wish people would stop calling every element-specific behavior HTML parsers do "liberal and tag-soup"-like. Yes WHATWG HTML does define error recovery rules, and HTML had introduced historic blunders to accomodate inline CSS and inline JS, but almost always what's being complained about are just SGML empty elements (aka HTML void elements) or tag omission (as described above) by folks not doing their homework.

[1]: https://sgmljs.sgml.net/docs/html5.html#tag-omission (see also XML Prague 2017 proceedings pp. 101ff)

HTML becomes pretty delightful for prototyping when you embrace this. You can open up an empy file and start typing tags with zero boilerplate. Drop in a script tag and forget about getElementById(); every id attribute already defines a JavaScript variable name directly, so go to town. Today the specs guarantee consistent behavior so this doesn't introduce compatiblity issues like it did in the bad old days of IE6. You can make surprisingly powerful stuff in a single file application with no fluff.

I just wish browsers weren't so anal about making you load things from http://localhost instead of file:// directly. Someone ought to look into fixing the security issues of file:// URLs so browsers can relax about that.

  • A workaround for the file:// security deny is to use a JavaScript file for data (initialized array) rather than something more natural like JSON.

    Apparently JavaScript got grandfathered in as ok for direct access!

  • Wow, I had never heard of that ID -> variable feature

    • More specifically it becomes a property of window, which is the global object.

      So <div id="hello"> becomes accessible as window["hello"], which means you can just directly write hello.innerText = "Hi!".

      Since this may conflicts with any of the hundreds of other properties on window, it's generally not something that should be used.

      Historically it wasn't too uncommon to see it, but since it doesn't work well with typescript, it's very rare now.

      1 reply →

    • Yeah it was hard to believe when I first learned about it, but it's true. I think I first found out when I forgot to put in a getElementById call and my code still worked.

    • It's been there since the beginning but it has several exceptions, like it's not available in strict mode and modules. Ask your ChatGPT if implied globals are right for you.

    • Also window.document.forms gets you direct access to all forms, "name" automatically attach an attribute to the parents and "this" rebind to the current element on inline event handler.

      The DOM API may have been very messy at creation, but it is also very handy and powerful, especially for binding to a live programming visual environment with instant remote update capabilities.

      2 replies →

  • > Someone ought to look into fixing the security issues of file:// URLs

    If you mean full sandboxing of applications with a usable capability system, then yeah, someone ought to do that. But I wouldn't hold my breath, there's a reason why nobody did yet.

  • Yes i love quickly creating tools in a single file, if the tool gets really complex I'll switch to a sveltekit Static site. I have a default css file I use for all of them to make it even quicker and not look so much like AI slop.

    I think every dev should have a tools.TheirDomain.zzz where they put different tools they create. You can make so many static tools and I feel like everyone creates these from time to time when they are prototyping things. There's so many free options for static hosting and you can write bash deploy scripts so quickly with AI, so its literally just ./deploy.sh to deploy. (I also recommend writing some reusable logic for saving to local storage/indexedDB so its even nicer.)

    Mine for example is https://tools.carsho.dev (100% offline/static tools, no monetization)

This is what I complain about:

https://nvd.nist.gov/vuln/detail/CVE-2020-26870

https://sirre.al/2025/08/06/safe-json-in-script-tags-how-not...

https://bughunters.google.com/blog/5038742869770240/escaping...

None of those problems exist in XHTML.

  • I guess you're replying to my comment because you were triggered by my last sentence. I wasn't criticizing you specifically, but yeah, in another comment you're writing

    > It probably didn't help that XHTML did not offer any new features over tag-soup HTML syntax.

    which unfortunately reaks of exactly the kind of roundabout HTML criticism that is not so helpful IMO. We have to face the possibility that most HTML documents have already been written at this point, at least if you value text by humans.

    The CVEs you're referencing are due to said historic blunders allowing inline JS or otherwise tunneling foreign syntax in markup constructs (mutation XSSs are only triggered by serialising and reparsing HTML as part of bogus sanitizer libs anyway).

    If you look at past comments of mine, you'll notice I'm staunchly criticizing inline JS and CSS (should always be placed in external "resources") and go as far as saying CSS or other ad-hoc item-value syntax should not even exist when attributes already serve this purpose.

    The remaining CVE is made possible by Hickson's overly liberal rules for what's allowed or needs escaping in attributes vs SGML's much stricter rules.

    • Inline JS or CSS is fine if typed directly by humans. It's only a problem when generated. Generated resources should always be in separate files.

      I like the flexibility of being able to make one file HTML apps with inline resources when I'm not generating code. But there should be better protections against including inline scripts in generated code unintentionally.

Omitting <body> can lead to weird surprises. I once had some JavaScript mysteriously breaking because document.body was null during inline execution.

Since then I always write <body> explicitly even though it is optional.