There's an older pure Python version but it's no longer maintained - the author of that recently replaced it with a Python library wrapping the C# code.
This looks to me like the perfect opportunity for a language-independent conformance suite - a set of tests defined as data files that can be shared across multiple implementations.
This would not only guarantee that the existing C# and TypeScript implementations behaved exactly the same way, but would also make it much easier to build and then maintain more implementations across other languages.
That new Python library is https://pypi.org/project/fractured-json/ but it's a wrapper around the C# library and says "You must install a valid .NET runtime" - that makes it mostly a non-starter as a dependency for other Python projects because it breaks the ability to "pip install" them without a significant extra step.
And OK it's not equivalent to a formal proof, but passing 1,000+ tests that cover every aspect of the specification is pretty close from a practical perspective, especially for a visual formatting tool.
Well yeah, but then any discrepancies that are found can be discussed (to decide which of the behaviors is the expected one) and then added as a test for all existing and future implementations.
I tokenized these and they seem to use around 20% less tokens than the original JSONs. Which makes me think a schema like this might optimize latency and costs in constrained LLM decoding.
I know that LLMs are very familiar with JSON, and choosing uncommon schemas just to reduce tokens hurts semantic performance. But a schema that is sufficiently JSON-like probably won't disrupt model path/patterns that much and prevent unintended bias.
These JSON files are actually readable, congrats.
I’m wondering whether this could be handled via an additional attached file instead. For example, I could have mycomplexdata.json and an accompanying mycomplexdata.jsonfranc. When the file is opened in the IDE, the IDE would merge the two automatically.
That way, the original JSON file stays clean and isn’t polluted with extra data.
Works right up until you get an entity where the field `comments` is suddenly relevant and then you need to go change everything everywhere. Much better to use the right tool for the job, if you want JSONC, be explicit and use JSONC.
Personally, I think if your JSON needs comments then it's probably for config or something the user is expected to edit themselves, and at that point you have better options than plain JSON and adding commentary to the actual payload.
If it's purely for machine consumption then I suspect you might be describing a schema and there are also tools for that.
This is interesting.
I’d very much like to see a code formatter do that kind of thing; currently formatters are pretty much inflexible, which makes getting structure out of a formatted code sometimes hard.
I just built a C++ formatter that does this (owned by my employee, unfortunately). There's really only two formatting objects: tab-aligned tables, and single line rows. Both objects also support a right-floating column/tab aligned "//" comment.
Both objects desugar to a sequence of segments (lines).
The result is that you can freely mix expression/assignment blocks & statements. Things like switch-case blocks & macro tables are suddenly trivial to format in 2d.
Because comments are handled as right floating, all comments nicely align.
I vibe coded the base layer in an hour. I'm using with autogenerated code, so output is manually coded based on my input. The tricky bit would be "discovering" tables & block. I'd jus use a combo of an LSP and direct observation of sequential statements.
Right. In my previous work, I wrote a custom XML formatter for making it look table-like which was our use case. Of course, an ideal solution would have been to move away from XML, but can't run away from legacy.
This is pretty cool, but I hope it isn't used for human-readable config files. TOML/YAML are better options for that. Git diff also can be tricky with realignment, etc.
I can see potential usefulness of this is in debug mode APIs, where somehow comments are sent as well and are rendered nicely. Especially useful in game dev jsons.
This looks very readable. The one example I didn't like is the expanded one where it expanded all but 1 of the elements. I feel like that should be an all or norhing thing, but there's bound to be edge cases.
And BTW, thanks for supporting comments - the reason given for keeping comments out of standard Json is silly ( "they would be used for parsing directives" ).
It's a pretty sensible policy, really. Corollary to Hyrum's Law - do not permit your API to have any behaviours, useful or otherwise, which someone might depend on but which aren't part of your design goals. For programmers in particular, who are sodding munchkins and cannot be trusted not to do something clever but unintended just because it solves a problem for them, that means aggressively hamstringing everything.
A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
> A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
While I admire his design goals, people will just work around it in a pinch by adding a "comment" or "_comment" or "_comment_${random_uuid}", simply because they want to do the job they need.
If your screwdriver bends like a rubber when prying, damn it, I'll just put a screw next to it, so it thinks it is used for driving screws and thus behaves correctly.
On one hand, it has made json more ubiquitous due to it's frozen state. On another hand, it forces everyone to move to something else and fragments progress. It would be much easier for people to move to json 2.0 rather than having hundreds of json + x standards. Everyone is just reinventing json with their own little twist that I feel sad that we haven't standardized to a single solution that doesn't go super crazy like xml.
I don't disagree with the choice, but seeing how things turned out I can't just help but look at the greener grass on the other side.
JSON is used as config files and static resources all the time. These type of files really need comments. Preventing comments in JSON is punishing the wide majority to prevent a small minority from doing something stupid. But stupid gonna stupid, it's just condescending from Mister JSON to think he can do anything about it.
It looks like there are two maintained implementations of this at the moment - one in C# https://github.com/j-brooke/FracturedJson/wiki/.NET-Library and another in TypeScript/JavaScript https://github.com/j-brooke/FracturedJsonJs. They each have their own test suite.
There's an older pure Python version but it's no longer maintained - the author of that recently replaced it with a Python library wrapping the C# code.
This looks to me like the perfect opportunity for a language-independent conformance suite - a set of tests defined as data files that can be shared across multiple implementations.
This would not only guarantee that the existing C# and TypeScript implementations behaved exactly the same way, but would also make it much easier to build and then maintain more implementations across other languages.
Interestingly the now-deprecated Python library does actually use a data-driven test suite in the kind of shape I'm describing: https://github.com/masaccio/compact-json/tree/main/tests/dat...
That new Python library is https://pypi.org/project/fractured-json/ but it's a wrapper around the C# library and says "You must install a valid .NET runtime" - that makes it mostly a non-starter as a dependency for other Python projects because it breaks the ability to "pip install" them without a significant extra step.
This is a good idea, though I don’t think it would guarantee program equivalence beyond the test cases.
Depends on how comprehensive the test suite is.
And OK it's not equivalent to a formal proof, but passing 1,000+ tests that cover every aspect of the specification is pretty close from a practical perspective, especially for a visual formatting tool.
1 reply →
Well yeah, but then any discrepancies that are found can be discussed (to decide which of the behaviors is the expected one) and then added as a test for all existing and future implementations.
This is great! The more human-readable, the better!
I've also been working in the other direction, making JSON more machine-readable:
https://github.com/kstenerud/bonjson/
It has EXACTLY the same capabilities and limitations as JSON, so it works as a drop-in replacement that's 35x faster for a machine to read and write.
No extra types. No extra features. Anything JSON can do, it can do. Anything JSON can't do, it can't do.
Is there an option for it to read the contents from a pipe? that's by far my biggest use for the jq app.
You can (usually) specify the input file name as “-“ (single hyphen) to read from stdin
There's a C# CLI app in the repo: https://github.com/j-brooke/FracturedJson/blob/main/Fracture...
It looks like both the JavaScript version and the new Python C# wrapper have equivalent CLI tools as well.
this would be amazing to be chained with jq, that was my first thought as well.
I tokenized these and they seem to use around 20% less tokens than the original JSONs. Which makes me think a schema like this might optimize latency and costs in constrained LLM decoding.
I know that LLMs are very familiar with JSON, and choosing uncommon schemas just to reduce tokens hurts semantic performance. But a schema that is sufficiently JSON-like probably won't disrupt model path/patterns that much and prevent unintended bias.
Minified json would use even less tokens
These JSON files are actually readable, congrats. I’m wondering whether this could be handled via an additional attached file instead. For example, I could have mycomplexdata.json and an accompanying mycomplexdata.jsonfranc. When the file is opened in the IDE, the IDE would merge the two automatically.
That way, the original JSON file stays clean and isn’t polluted with extra data.
While I wish JSON formally supported comments, it seems more sensible (compatible) to just nest them inside of a keyed list or object as strings.
Works right up until you get an entity where the field `comments` is suddenly relevant and then you need to go change everything everywhere. Much better to use the right tool for the job, if you want JSONC, be explicit and use JSONC.
Personally, I think if your JSON needs comments then it's probably for config or something the user is expected to edit themselves, and at that point you have better options than plain JSON and adding commentary to the actual payload.
If it's purely for machine consumption then I suspect you might be describing a schema and there are also tools for that.
This is interesting. I’d very much like to see a code formatter do that kind of thing; currently formatters are pretty much inflexible, which makes getting structure out of a formatted code sometimes hard.
I just built a C++ formatter that does this (owned by my employee, unfortunately). There's really only two formatting objects: tab-aligned tables, and single line rows. Both objects also support a right-floating column/tab aligned "//" comment.
Both objects desugar to a sequence of segments (lines).
The result is that you can freely mix expression/assignment blocks & statements. Things like switch-case blocks & macro tables are suddenly trivial to format in 2d.
Because comments are handled as right floating, all comments nicely align.
I vibe coded the base layer in an hour. I'm using with autogenerated code, so output is manually coded based on my input. The tricky bit would be "discovering" tables & block. I'd jus use a combo of an LSP and direct observation of sequential statements.
You built it, but your employee owns it? That sounds highly unusual.
Right. In my previous work, I wrote a custom XML formatter for making it look table-like which was our use case. Of course, an ideal solution would have been to move away from XML, but can't run away from legacy.
This is pretty cool, but I hope it isn't used for human-readable config files. TOML/YAML are better options for that. Git diff also can be tricky with realignment, etc.
I can see potential usefulness of this is in debug mode APIs, where somehow comments are sent as well and are rendered nicely. Especially useful in game dev jsons.
Yaml is the worst. Humans and LLMs alike get it wrong. I used to laugh at XML but Yaml made me look at XML wistfully.
Yaml - just say Norway
The Norway issue is a bit blown out of proportion seeing as the country should really be a string `"no"` rather than the `no` value
2 replies →
Just say Norway to YAML.
This is a reference to YAML parsing the two letter ISO country code for Norway:
As equivalent to a boolean falsy value:
It is a relatively common source of problems. One solution is to escape the value:
More context: https://www.bram.us/2022/01/11/yaml-the-norway-problem/
4 replies →
This looks very readable. The one example I didn't like is the expanded one where it expanded all but 1 of the elements. I feel like that should be an all or norhing thing, but there's bound to be edge cases.
Nice.
And BTW, thanks for supporting comments - the reason given for keeping comments out of standard Json is silly ( "they would be used for parsing directives" ).
It's a pretty sensible policy, really. Corollary to Hyrum's Law - do not permit your API to have any behaviours, useful or otherwise, which someone might depend on but which aren't part of your design goals. For programmers in particular, who are sodding munchkins and cannot be trusted not to do something clever but unintended just because it solves a problem for them, that means aggressively hamstringing everything.
A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
> A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
While I admire his design goals, people will just work around it in a pinch by adding a "comment" or "_comment" or "_comment_${random_uuid}", simply because they want to do the job they need.
If your screwdriver bends like a rubber when prying, damn it, I'll just put a screw next to it, so it thinks it is used for driving screws and thus behaves correctly.
1 reply →
On one hand, it has made json more ubiquitous due to it's frozen state. On another hand, it forces everyone to move to something else and fragments progress. It would be much easier for people to move to json 2.0 rather than having hundreds of json + x standards. Everyone is just reinventing json with their own little twist that I feel sad that we haven't standardized to a single solution that doesn't go super crazy like xml.
I don't disagree with the choice, but seeing how things turned out I can't just help but look at the greener grass on the other side.
> A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
Better not let me near your JSON files then. I pound in wall anchors with the bottom of my drill if my hammer is not within arms reach.
JSON is used as config files and static resources all the time. These type of files really need comments. Preventing comments in JSON is punishing the wide majority to prevent a small minority from doing something stupid. But stupid gonna stupid, it's just condescending from Mister JSON to think he can do anything about it.
XML people were doing crazy things in the Java/.NET world and "<!--[if IE 6]>" was still a thing in HTML when JSON was being designed.
I also would have wanted comments, but I see why Crockford must have been skeptical. He just didn't want JSON to be the next XML.
Unrelated: why spaces inside the parentheses? It’s not the first time I see this, but this is incorrect!
JSON doesn't have parentheses, but it does have braces and brackets. The JSON spec specifically allows spaces.
> Insignificant whitespace is allowed before or after any token.
1 reply →
Great. Now integrate this into every JSON library and tool so I get to see it's output more often
I think integration into jq would be both powerful and sufficient.
Powerful but not sufficient. There’s plenty of us who don’t use jq for various reasons.