Comment by conartist6
9 hours ago
Just gonna drop this here : ) https://docs.bablr.org/guides/cstml
CSTML is my attempt to fix all these issues with XML and revive the idea of HTML as a specific subset of a general data language.
As you mention one of the major learnings from the success of JSON was to keep the syntax stupid-simple -- easy to parse, easy to handle. Namespaces were probably the feature to get the most rework.
In theory it could also revive the ability we had with XHTML/XSLT to describe a document in a minimal, fully-semantic DSL, only generating the HTML tag structure as needed for presentation.
I unfortunately disagree that your syntax is "stupid-simple." But it highlights an impedance mismatch between XML users and JSON users.
JSON treats text as one of several equally-supported datatypes, and quotes all strings. Great if your data is heavily structured, and text is short and mixed with other types of data. Awful if your data is text.
XML and other SGML apps put the text first and foremost. Anything that's not text needs to be tagged, maybe with an attribute to indicate the intended type. It's annoying to express lots of structured, short-valued data. But it's simple and easy for text markup where the text predominates.
CSTML at first glance seems to fall into the JSON camp. Quoting every string literal makes plenty of sense in JSON, but not in the HTML/text-markup world you seem to want to play in.
Yeah "impedance mismatch" is a good way of putting it.
I wouldn't say we fall into the JSON camp at all though, but quite squarely into the XML-ish camp! We just wrap the inner text in quotes to make sure there's no confusion between the formatting of the text stored IN the document and the formatting of the document itself. HTML is hiding a lot of complexity here: https://blog.dwac.dev/posts/html-whitespace/. We're actually doing exactly what the author of that detailed investigation recommends.
You can see how it plays out when CSTML is used to store an HTML document https://github.com/bablr-lang/bablr-docs/blob/1af99211b2e31f.... Having the string wrappers makes it possible to precisely control spaces and newlines shown to the user while also having normal pretty-formatting. Compare this to a competing product SrcML which uses XML containers for parse trees and no wrapper strings. Take a look at the example document here: https://www.srcml.org/about.html. A simple example is three screens wide because they can't put in line breaks and indentation without changing the inner text!
As to the simplicity of the syntax I think you would understand what I mean if you were writing a parser.
It's particularly gratifying that you can easily interpret CSTML with a stream parser. XML cannot work this way because this particular case is ambiguous:
What does Name mean in this fragment of syntax? Is it the name of a namespace? Or the name of a node? We won't know until we look forward and see if the next character is :
That's why we write `<Namespace:Name />` as `:Namespace: <Name />` - it means there's no point in the left-to-right parse at which the meaning is ambiguous. And finally CSTML has no entity lookups so there's no need to download a DTD to parse it correctly.
I realised the other day that some of my test code has 'jumped' rather than 'jumps' for the intended panagram. Glad to see I'm not alone. :^)
Haha yeah someone pointed that out to me and I decided to leave it. I just needed a sentence, I'm not actually trying to show off every glyph in a font.
That was my reasoning for not fixing it, too. Fair!