Comment by PaulHoule

5 hours ago

Circa '99 a high fraction (50%-ish) of HTML in the field was invalid, so if you were making a new web browser it had to parse invalid HTML the same way as Netscape which was one more reason we didn't get competitive web browsers.

HTML 5 specified exactly how "invalid" HTML is parsed so now there is no such thing as invalid HTML. XHTML was one of those things that never quite worked:

https://friendlybit.com/html/why-xhtml-is-a-bad-idea/

> there is no such thing as invalid HTML

There is. There are things that are still considered invalid, like nesting form elements for instance.

(this doesn't take away your argument though, and you were focusing on the parsing aspect).

  • The things that are invalid should all have defined behaviour. For example, a <label> is not allowed to contain two form controls, but is defined as applying to the first such control.

    As far as parse errors is concerned, https://html.spec.whatwg.org/multipage/parsing.html#parse-er... says:

    > This specification defines the parsing rules for HTML documents, whether they are syntactically correct or not. Certain points in the parsing algorithm are said to be parse errors. The error handling for parse errors is well-defined (that's the processing rules described throughout this specification), but user agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification.

    • > The things that are invalid should all have defined behaviour

      100% agree.

      And then I guess the philosophical question is "What's invalid when everything is defined?"

      1 reply →