Comment by zoogeny

5 months ago

This looks like it loses the distinction between attributes and nested tags?

As in, I don't see a difference between `(attr "val")` which expresses an attribute key/value pair and `(thing "world")` which expresses a tag/content relationship. Even if I thought the rule might be "if the first element of the list is a list itself then it should be interpreted as a set of attribute key value pairs" then I would still be ambiguous with:

    (foo (bar "baz") "content")

which could serialize to either:

    <foo bar="baz">content</foo>

or:

    <foo><bar>baz</bar>content</foo>

In fact, this ambiguity between attributes and children has always been one of the head scratching things for me about XML. Well, the thing I've always disliked the most is namespaces but that is another matter.

There's no ambiguity. The first element is a symbol that's the name of a tag. If the second element is a list of two element symbol + string lists, it's the attributes. If it's one of the other recognized types, it's part of the contents of the tag.

See a grammar for the representation at https://docs.racket-lang.org/xml/index.html#%28def._%28%28li...

Most Scheme tools for working with XML use a different layout where a list starting with the symbol @ indicates attributes. See https://en.wikipedia.org/wiki/SXML for it.

  • I see, so my example should be:

        (foo (bar "baz") "content")
    

    vs

        (foo ((bar "baz")) "content")
    

    Where the first one would be the nested tags and the second one would be a single `bar="baz"` attribute.

    I would prefer the differentiation to be more explicit than the position and/or structure of the list, so the @ symbol modifier for the attribute list in other tools makes sense.

    The sibling comment with a map with a :attrs key feels even better. I don't work in languages with pattern matching or that kind of thing very often, but if I was wanting to know if a particular element had 1 or more attributes then being able to check a dictionary key just feels like a nicer kind of anchor point to match against.

> In fact, this ambiguity between attributes and children has always been one of the head scratching things for me about XML. Well, the thing I've always disliked the most is namespaces but that is another matter.

Just remember that it's a markup language, and then it's not head-scratching at all: the text is the text being marked up, and the attribute values are the attribute of the markup - things like colour and font.

When it was co-opted to store structured data, those people didn't obey this rule (which would make everything attributes).

Namespaces had a very cool use in XHTML: you could just embed an SVG or MathML directly in your HTML and the browser would render it. This feature was copied into HTML5.

  • When you say "those people", you mean people like me who (used to) have to navigate how to model structured data using XML. I think the attribute vs. child distinction makes sense in a very flat hierarchy where you are marking up text but quickly devolves into ambiguity for many other uses cases.

    I mean, if I'm modeling a <Person> node in some structured format, making a decision about "what is the attribute of the person node" vs "what is a property of the specific Person" isn't an easy call to make in all cases. And then there are cases where an attribute itself ought to have some kind of hierarchy. Even the text example works here: I have a set of font properties and it would make sense to maybe have:

        <font>
            <color>...</color>
            <family>...<family>
        </font>
    

    Rather than a series of `fontFamily`, `fontSize`, etc. attributes. This is true when those attributes are complex objects that ended up having nesting at several levels. You end up in the circumstance where you are forced to make things that ought to be attributes into children because you want to model the nested structure of the attributes themselves. Then you end up with some kind of wrapper structure where you might have a section for meta-data and a section for the real content.

    I just don't think the distinction works well for an extensible markup language where the nesting of elements is more or less the entire point.

    It is much easier to write out though, which is why you see often see `<Element content=" ... " />` patterns all over the place.

    • When using XML for structured data the intended way, everything that is a string value (as opposed to a node hierarchy) would be an attribute. There's no text, so there would be no text.