Comment by jviotti
10 months ago
Very cool! How are the balloons transferring telemetry back to earth for analysis, etc?
Asking because my research at the University of Oxford was around hyper space-efficient data transfer from remote locations for a fraction of the price.
The result was an award-winning technology (https://jsonbinpack.sourcemeta.com) to serialise plain JSON that was proven to be more space-efficient than every tested alternative (including Protocol Buffers, Apache Avro, ASN.1, etc) in every tested case (https://arxiv.org/abs/2211.12799).
If it's interesting, I'd love to connect and discuss (jv@jviotti.com) how at least the open-source offering could help.
It surprised me how popular this message got. I love nerding out about binary serialization and space-efficiency and great to see I'm not the only one :)
If you want to get deeper, I published two (publicly available) deep papers studying the current state of JSON-compatible binary serialization that you might enjoy. They study in a lot of detail technologies like Protocol Buffers, CBOR, MessagePack, and others that were mentioned in the thread:
- https://arxiv.org/abs/2201.02089
- https://arxiv.org/abs/2201.03051
Hope they are useful!
> JSON BinPack is space-efficient, but what about runtime-efficiency?
> When transmitting data over the Internet, time is the bottleneck, making computation essentially free in comparison.
i thought this was an odd sales pitch from the jsonbinpack site, given that a central use-case is IoT, which frequently runs on batteries or power-constrained environments where there's no such thing as "essentially free"
Fair point! "Embedded" and "IoT" are overloaded terms. For example, you find "IoT" devices all the way from extremely low powered micro-controllers to Linux-based ones with plenty of power and they are all considered "embedded". I'll take notes to improve the wording.
That said, the production-ready implementation of JSON BinPack is designed to run on low powered devices and still provide those same benefits.
A lot of the current work is happening at https://github.com/sourcemeta/jsontoolkit, a dependency of JSON BinPack that implements a state-of-the-art JSON Schema compiler (I'm a TSC member of JSON Schema btw) to do fast and efficient schema evaluation within JSON BinPack on low powered devices compared to the current prototype (which requires schema evaluation for resolving logical schema operators). Just an example of the complex runtime-efficiency tracks we are pursuing.
> batteries or power-constrained environments
I would imagine that CPUs are much more efficient than a satellite transmitter, probably? I guess you'd have to balance the additional computational energy required vs. the savings in energy from less transmitting.
Yeah, it all depends very much, given how huge the "embedded/IoT" spectrum is. Each use case has its own unique constraints, which makes it very hard to give general advice.
For sure, but radio transmitter time is almost always much more expensive than CPU time! It’s 4mA-20mA vs 180mA on an esp32; having the radio on is a 160mA load! As long as every seven milliseconds compressing saves a millisecond of transmission, your compression algorithm comes out ahead.
Sounds like you are pretty familiar with satellite transmission at the hardware level. If so, I would love to chat to get your brains on it. I don't know much of the hardware constraints myself.
1 reply →
> on an esp32;
ironically the main criticism i've heard of these is how power-inefficient they are :P
Let's definitely talk, we're using protobufs right now. I'll send an email
This looks promising! One of the important aspects of protocol buffers, avro etc is how they deal with evolving schemas and backwards/forward compatibility. I don't see anything in the docs addressing that. Is it possible for old services to handle new payloads / new services to handle old payloads or do senders and receivers need to be rewritten each time the schema changes?
Good question! Compared to Protocol Buffers and Apache Avro, that each have their own specialised schema languages created by them, for them, JSON BinPack taps into the popular and industry-standard JSON Schema language.
That means that you can use any tooling/approach from the wide JSON Schema ecosystem to manage schema evolution. A popular one from the decentralised systems world is Cambria (https://www.inkandswitch.com/cambria/).
That said, I do recognise that schema evolution tech in the JSON Schema world is not as great as it should be. I'm a TSC member of JSON Schema and a few of us are definitely thinking hard on this problem too and trying to make it even better that the competition.
A lot of people already think about this problem with respect to API compatibility for REST services using the OpenAPI spec for example. It's possible to have a JSON Schema which is backwards compatible with previous versions. I'm not sure how backwards-compatible the resulting JSON BinPack schemas are however.
Great seeing you over here Michael :) For other people reading this thread, Michael and I are collaborating on a paper covering the schema compiler I've been working on for JSON BinPack. Funny coincidence!
Do you have any info on how your system stacks up to msgpack? (https://msgpack.org/index.html)
Asking because we use msgpack in production at work and it can sometimes be a bit slower to encode/decode than is ideal when dealing with real-time data.
We do! See https://benchmark.sourcemeta.com for a live benchmark and https://arxiv.org/abs/2211.12799 for a more detailed academic benchmark.
The TLDR is that is that if you use JSON BinPack on schema-less mode, its still more space-efficient than MessagePack but not by a huge margin (depends on the type of data of course). But if you start passing a JSON Schema along with your data, the results become way smaller.
Please reach out to jv@jviotti.com. I would love to discuss your use case more.
Why this over a compact, data-specific format? JSON feels like an unnecessary limitation for this company's use case. I am having a hard time believing it is more space-efficient than a purpose-built format.
Compared to other serialisation formats, JSON BinPack analyses your data and derives custom encoding rules that are specific to the data at hand given all the context it had on it. That's why on JSON BinPack, the static analysis part is the most complex one by far, and why I'm building so much JSON Schema advanced tooling in https://github.com/sourcemeta/jsontoolkit (i.e. check the huge API surface for JSON Schema in the docs: https://jsontoolkit.sourcemeta.com/group__jsonschema.html)
Of course there is still a lot to do, but the idea being that what you get with JSON BinPack is extremely close to what you would have done for manually encoding your data, except that you don't have to worry about encoding things yourself :) Thus you get the best of both worlds: the nicety of JSON and the space-efficiency of manual encoding.
From the OP:
> Our payload uses a satellite transceiver for communications
That's the hardware. I meant on the software side through the transceiver. If you transfer less bits through the satellite transceiver, I believe you can probably reduce costs.
Sounds cool. How does it differ from CBOR?
CBOR is a schema-less binary format. JSON BinPack supports both schema-less (like CBOR) and schema-driven (with JSON Schema) modes. Even on the schema-less mode, JSON BinPack is more space-efficient than CBOR. See https://benchmark.sourcemeta.com for a live benchmark and https://arxiv.org/abs/2211.12799 for a more detailed academic benchmark
Thanks for linking the benchmarks. I appreciate the work on shaving additional bytes especially in cases where every byte matters. Real savings seem to be in the schema-driven mode. Comparing a "realistic", schemaless payload for a general storage use-case (eg. the config examples), it looks pretty even with CBOR. E: my bad, BinPack is getting more efficient with larger payloads https://benchmark.sourcemeta.com/#jsonresume
3 replies →