Comment by delaminator

2 days ago

Go back a bit further for why.

Netscape Navigator did, in fact, reject invalid HTML. Then along came Internet Explorer and chose “render invalid HTML dwim” as a strategy. People, my young naive self included, moaned about NN being too strict. NN eventually switched to the tag soup approach. XHTML 1.0 arrived in 2000, attempting to reform HTML by recasting it as an XML application. The idea was to impose XML’s strict parsing rules: well-formed documents only, close all your tags, lowercase element names, quote all attributes, and if the document is malformed, the parser must stop and display an error rather than guess. XHTML was abandoned in 2009. When HTML5 was being drafted in 2004-onwards, the WHATWG actually had to formally specify how browsers should handle malformed markup, essentially codifying IE’s error-recovery heuristics as the standard.

But not closing <p> etc has always been valid HTML. Back from SGML it was possible for closing tags to be optional (depending on the DTD), and Netscape supported this from the beginning.

Leaving out closing tags is possible when the parsing is unambigous. E.g <p>foo<p>bar is unambiguous becuse p elements does not nest, so they close automatically by the next p.

The question about invalid HTML is a sepearate issue. E.g you can’t nest a p inside an i according to the spec, so how does a browser render that? Or lexical error like illegal characters in a non-quoted attribute value.

This is where it gets tricky. Render anyway, skip the invalid html, or stop rendering with an error message? HTML did not specify what to do with invalid input, so either is legal. Browsers choose to go with the “render anyway” approach, but this lead to different outputs in different browsers, since it wasn’t agreed upon how to render invald html.

The difference between Netscape and IE was that Netscape in more cases would skip rendering invalid HTML, where IE would always render the content.

The article itself falsifies this explanation; IE wasn't released until August 1995. The HTML draft specs published prior to this already specified that these tags didn't need closing; these simply weren't invalid HTML in the first place.

The oldest public HTML documentation there is, from 1991, demonstrates that <li>, <dt>, and <dd> tags don't need to be closed! And the oldest HTML DTD, from 1992, explicitly specifies that these, as well as <p>, don't need closing. Remember, HTML is derived from SGML, not XML; and SGML, unlike XML, allows for the possibility of tags with optional close. The attempt to make HTML more XML-like didn't come until later.

Optinal tags have always been allowed in HTML, for the simple if debatable reason (hence xhtml) that some humans still author documents by hand, knowingly skip md et al _and_ want to write as few characters as possible (I do!).

This is clear in Tim Berners-Lee's seminal, pre-Netscape "HTML Tags" document [0], through HTML 4 [4] and (as you point out) through the current living standard [5].

[0] https://www.w3.org/History/19921103-hypertext/hypertext/WWW/...

[4] https://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.1

[5] https://html.spec.whatwg.org/multipage/syntax.html#optional-...

NN did not reject invalid HTML. It could not incrementally render tables, while IE could. That's all.

Because table layout was common, a missing </table> was a common error that resulted in a blank page in NN. That was a completely unintentional bug.

Optional closing tags were inherited from SGML, and were always part of HTML. They're not even an error.

I didn't know that Navigator was ever strict, and bit funny story about when I complained that they hadn't been strict...

Around 2000, I was meeting with Tim Berners-Lee, and I mentioned I'd been writing a bunch of Web utility code. He wanted to see, so I handed him some printed API docs I had with me. (He talked and read fast.)

Then I realized he was reading the editorializing in my permissive parser docs, about how browser vendors should've put a big error/warning message on the window for invalid HTML.

Which suddenly felt presumptuous of me, to be having opinions about Web standards, right in front of Tim Berners-Lee at the time.

(My thinking with the prominent warning message that every visitor would see, in mid/late-'90s, was that it would've been compelling social pressure at the time. It would imply that this gold rush dotcom or aspiring developer wasn't good at Web. Everyone was getting money in the belief that they knew anything at all about Web, with little way to evaluate how much they knew.)

Former NCSA employee here. The fuck they did. Netscape caught us out time and again for accepting SGML garbage that we didn’t handle properly. It’s a big part of why Netscape won that round of the browser wars. Such recovery then wound up in tools that generated web pages for you and it was all over but the crying. JavaScript was just the last straw. Which I tried to talk them into adopting but got no traction.

I have bad memories of Netscape 4 and IE4 (I think those were the versions) which both allowed invalid HTML but had different rules for doing it. Accidentally missed off a closing table tag once, and one browser displayed the remainder of the page, but the other didn't.