Comment by GuB-42
9 hours ago
I think JSON has the opposite problem, it is too simple, the lack of comments in particular is particularly bad for many common usages of the format today.
I know some implementations of JSON support comments and other things, but is is not true JSON, in the same way that most simple XML implementations are not true XML. That's what I say "opposite problem", XML is too complex, and most practical uses of XML use incomplete implementations, while many practical uses of JSON use extended implementations.
By the way, this is not a problem for what JSON was designed for: a text interchange format, with JS being the language of choice, but it has gone beyond its design: configuration files, data stores, etc...
A lot of people dislike that decision not to include comments in JSON, but I think while shocking it was and is totally correct.
In a programming language it's usually free to have comments because the comment is erased before the program runs; we usually render comments in grey text because they can't change the meaning of the program.
In a data language you have no such luxury. In a data language there's no comment erasure happening between the producer and the consumer, so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workarounds.
I don't dislike the decision at all, FWIW! For data interchange it's totally reasonable. But it does make JSON ill-suited for a bunch of applications that JSON has been forcefully and unfortunately applied to.
> so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workarounds
IIRC Douglas Crockford explicitly stated that he saw people initially using comments for a purpose like ad hoc preprocessor directives.
Could you imagine hitting a rest api and like 25% of the bytes are comments? lol
Worse than that - people will start tagging "this value is a Date" via comments, and you'll need to parse ad-hoc tags in the comments to decode the data. People already do tagging in-band, but at least it's in-band and you don't have to write a custom parser.
2 replies →
HTML and JS both have comments, I don't see the problem
2 replies →
> Could you imagine hitting a rest api and like 25% of the bytes are comments? lol
That's pretty much what already happens. Getting a numeric value like "120" by serializing it through JSON takes three bytes. Getting the same value through a less flagrantly wasteful format would take one.
I guess that's more than 25%. In the abstract ASCII integers are about 50% waste. ASCII labels for the values you're transferring are 100% waste; those labels literally are comments.
If you're worried about wasting bandwidth on comments, JSON shouldn't be a format you ever consider, for any purpose.
lol
> In a programming language it's usually free to have comments because the comment is erased before the program runs
That's inherent to the language specification, but it isn't inherent to the document. You have to have a system with rules that require that erasure.
Nothing prevents one from mandating a system that strips those comments out of JSON. You could even "compile" JSON to, I don't know, BSON or msgpack or something.
Just as nothing prevents one from creating tooling to, say, extract type annotations from comments in a dynamically typed language.
> In a data language there's no comment erasure happening between the producer and the consumer, so comments are just dangerous as they would without doubt evolve into a system of annotations -- an additional layer of communication which would then not be standardized at all and which then would grow into a wild west of nonstandard features and compatibility workarounds.
But there's nothing stopping you from commenting your JSON now. There's no obligation to use every field. There can't be, because the transfer format is independent of the use to which the transferred data is put after transfer.
And an unused field is a comment.
If this would 'without doubt' evolve into a system of annotations, JSON would already have a system of annotations.
> that decision not to include comments in JSON, but I think while shocking it was and is totally correct.
Yaml is fugly, but it emerged from JSON being unsupportive of comments. Now we’re stuck with two languages for configuration of infrastructure, a beautiful one without comments so unusable, the other where I can never format a list correctly on the first try, but comments are ok.
JSON is obviously perfectly usable, given how widely it's used. Even Douglas Crockford suggested just using a JSON interpreter that strips out comments, if you need them.
And if you want something like JSON that allows comments, and you aren't working on the web, Lua tables are fine.
> while shocking it was and is totally correct
Agreed —— consider how comments have been abused in HTML, XML, and RSS.
Any solution or technology that can be abused will be abused if there are no constraints.
No, it was obviously and flagrantly incorrect, as evidenced by the success of interchange formats that do allow for comments, including many real world systems that pragmatically allow comments even when JSON says they shouldn't. This is Stockholm Syndrome.
But what can we expect from a spec that somehow deems comments bad but can't define what a number is?
How do you feel numbers are ill defined in json? The syntactical definition is clear and seems to yield a unique and obvious interpretation of json numbers as mathematical rational numbers.
A given programming language may not have a built in representation for rational numbers in general. That isn't the fault of json.
1 reply →
As long as they stay comments there's no harm. As soon as they become struct tags and stripping comments affects the document's meaning you lose the plot.
I've said it before, but I maintain that XML has only two real problems:
1. Attributes should not exist. They make the document suddenly have two dimensions instead of one, which significantly increases complexity. Anything that could be an attribute should actually be a child element.
2. There should be one close tag: `</>` which closes the last element, which burns a significant amount of space with useless syntax. Other than that and the self-closing `<tag />` (which itself is less useful without attributes) there isn't much that you need. Maybe a document close tag like `<///>`
You'll notice that, yes, JSON solves both of those things. That's a part of why it's so popular. The other is just that a lot more effort was put into maximizing the performance of JavaScript than shredding XML, and XSLT, the intended solution to this problem, is infamous at this point.
The problem of comments is kind of a non-issue in practice, IMO. You can just add a `"_COMMENT"` element or similar. Sure, yes, it will get parsed. But you shouldn't have that many comments that it will cause a genuine performance issue.
However, JSON still has two problems:
1. Schema support. You can't validate that a file before de-serializing it in your application. JSON Schema does exist, but it's support is still thin, IMX.
2. Many serializers are pretty bad with tabular data, and nearly all of them are bad with tabular data by default. So sometimes it's a data serialization format that's bad at serializing bulk data. Yeah, XML is worse at this. Yeah, you can use the `"colNames": ["id", ...], "rows": [ [1,...],[2,...] ]` method or go columnar with `"id": [1,2,...], "name": [...], "createDate": [...]`, but you had better be sure both ends can support that format.
In both cases, it seems like there is an attempt to resolve both of those issues. OpenAPI 3.1 has JSON schema included in it. The most popular JSON parsers seem to be adding tabular data support. I guess we'll see.
I disagree on several points here:
1. I think attributes absolutely should exist. They're great for describing metadata related to the tag: e.g. element ID, language, datatype, source annotation, namespacing. They add little in complexity.
2. The point of a close tag with a name is to make it unambiguous what it's trying to close off.
It sounds to me like what you want is not a better XML, but just s-exprs. Which is fine, but not quite solving the same problem.
3. As far as schema support, it seems to me that JSON Schema is well-established and perfectly cromulent – so much so that YAML authors are trying to use it to validate their stuff (the poor bastards) – and XML schema validation, while robust, is a complex and fragmented landscape around DTD, XSD, RELAX-NG, and Schematron. So although XML might have the edge, it's a more nuanced picture than XML proponents are claiming.
4. As far as tabular data, neither XML nor JSON were built for efficient tabular data representation, so it shouldn't be a surprise that they're clunky at this. Use the right tool for the job.
> 1. I think attributes absolutely should exist. They're great for describing metadata related to the tag: e.g. element ID, language, datatype, source annotation, namespacing. They add little in complexity.
No, they're barely adequate for those purposes. And you could (and if you have a XSD you probably should) still replace them with elements. If you argue that you can't, then you're arguing that JSON does not function. You can just inline metadata along side data. That works just fine. That's the thing about metadata. It's data!
You don't need attributes. Having worked in information systems for 25 years now, they are the most heavily, heavily, heavily misused feature of XML and they are essentially always wrong.
Because when someone represents data like this:
You can write a XSD with the full set of rules for schema validation.
On the other hand, if you do this:
Well, now you're a bit stuck. You can make the XSD look at basic data types, and that's it. You can never use complex types. You can never use multiple values if you need it, or if you do you'll have to make your attribute a delimited string. You can never use complex types. You can't use order. You're limiting your ability to extend or advance things.
That's the problem with XML. It's so flexible it lets developers be stupid, while also claiming strictness and correctness as goals.
> 2. The point of a close tag with a name is to make it unambiguous what it's trying to close off.
Sure, but the fact that closing tags in the proper order is is mandatory, you're not actually including anything at all. The only thing you're doing is introducing trivial syntax errors.
Because the truth is that this is 100% unambiguous in XML because the rules changed:
The reason SGML had a problem with the generic close tag was because SGML didn't require a closing tag at all. That was a problem It didn't have `<tag />`. It let you say `<tag1><tag2>...</tag1>` or `<tag1><tag2>...</>`.
Named closing tags had more of a point when we were actually writing XML by hand and didn't have text editors that could find the open and close tags for you, but that is solved. And now we have syntax highlighting and hierarchical code folding on any text editor, nevermind dedicated XML editors.
> 3. As far as schema support, it seems to me that JSON Schema is well-established and perfectly cromulent
Then my guess is that you have worked exclusively in the tech industry for customers that are also exclusively in the tech industry. If you have worked in any other business with any other group of organizations, you would know that the rest of the world is absolute chaos. I think I've seen 3 examples of a published JSON Schema, and hundreds that do not.
> 4. As far as tabular data, neither XML nor JSON were built for efficient tabular data representation, so it shouldn't be a surprise that they're clunky at this. Use the right tool for the job.
No, I think you're looking at what the format was intended to do 25 years ago and trying to claim that that should not be extended or improved ever. You're ignoring what it's actually being used for.
Unless you're going to make data queries return large tabular data sets to the user interface as more or less SQLite or DuckDB databases so the browser can freely manipulate them for the user... you're kind of stuck with XML or JSON or CSV. All of which suck for different reasons.
Attributes exist due to it's origin as a markup language. XML is actually (big surprise) a pretty good markup language. Where the tags are sort of like function calls and the attributes are args. With little to no information to be gleaned out of the text. The big sin was to say "hey the tooling is getting pretty good for for these sgml like markup languages. Lets use it as a structured data interchange format. It's almost the same thing". Now all the data is in the text and the attributes are not just superfluous but actively harmful as there is a weird extra data axis that people will aggressively use.
Hard disagree about attributes, each tag should be a complete object and attributes describe the object.
But objects can also be containers and that's what nesting is for. There shouldn't ever be two dimensions in the way you're describing. The pattern of
is the root of most XML evil. Now you have to know if myobject is a container or a franken-object with a strict sub-schema in order to parse it. The biggest win of JSON is that .loads/.dump make it really obvious that it's for serializing complete objects where a lot of tooling surrounding XML makes you poke at the document tree.