Comment by chrismorgan

1 day ago

> People being too lazy to close the <br /> tag was apparently a gateway drug into absolute mayhem.

Your chronology is waaaaaaaaaaaay off.

<BR> came years before XML was invented. It was a tag that didn’t permit children, so writing it <BR></BR> would have been crazy, and inventing a new syntax like <BR// or <BR/> would have been crazy too. Spelling it <BR> was the obvious and reasonable choice.

The <br /> or <br/> spelling was added to HTML after XHTML had already basically lost, as a compatibility measure for porting back to HTML, since those enthusiastic about XHTML had taken to writing it and it was nice having a compatible spelling that did the same in both. (In XHTML you could also write <br></br>, but that was incorrect in HTML; and if you wrote <br /> in HTML it was equivalent to <br /="">, giving you one attribute with name "/" and value "". There were a few growing pains there, such as how <input checked> used to mean <input checked="checked">—it was actually the attribute name that was being omitted, not the value!—except… oh why am I even writing this, messy messy history stuff, engines doing their own thing blah blah blah, these days it’s <input checked="">.

Really, the whole <… /> thing is more an artefact of an arguably-misguided idea after a failed reform. The absolute mayhem came first, not last.

> I would hate to have to write a parser that's tolerant enough to deal with all the garbage people throw at it.

The HTML parser is magnificent, by far the best spec for something reasonably-sized that I know of. It’s exhaustively defined in terms of state machines. It’s huge, far larger than one would like it to be because of all this compatibility stuff, but genuinely easy to implement if you have the patience. Seriously, go read it some time, it’s really quite approachable.