Comment by elcritch

3 days ago

> Well, CBOR is MessagePack. Carsten Bormann forked MessagePack

Sure, that’s sort of true but missing context. Bormann (and others) wanted to add things such as separate string and byte sequence types. The MessagePack creator refused for years. Fair enough it’s his format. But it frustrated the community dealing with string vs bytes issues. It also highlights a core philosophical difference of a mostly closed spec vs an extensible first one.

> changed some of the tag values, wrote a standard around it, and submitted it to the IETF against the wishes of MessagePack's creators.

That’s just incorrect and a childish way to view it in my opinion.

The core philosophy and mental models are different in key aspects.

MessagePack is designed as a small self mostly closed format. It uses a simple TLV format with a couple hundred possible user extensions and some clever optimizations. The MP “spec” focuses on this.

CBOR re-envisioned the core idea of MessagePack from the ground up as an extensible major/minor tag system. It’s debatable how much CBOR is a fork of MPack vs a new format with similarities.

The resulting binary output is pretty similar with similar benefits but the core theoretical models are pretty different. The IETF standard bares little to no resemblance to the MessagePack specification.

> The facilities are the same (well, the tag is 8 bytes instead of 1 byte, but w/e); it's TLV all the way down (Bormann ripped this also).

The whole point of CBOR is that the tags go from 1-8 bytes. The parser designs end up fairly different due to the different tag formats. I’ve written and ported parsers for both.

It’s not like the MessagePack creator invented TLV formats either. He just created an efficient and elegant one that’s pretty general. No one says he ripped off “TLV”.

You can’t just take a message pack parser and turn it into a CBOR one by changing some values. I’ve tried and it turns out poorly and doesn't support much of CBOR.

> This refers to CBOR's indefinite length types, but awkwardly, streaming is a protocol level feature, not a data format level feature.

The indefinite length format is very useful for embedded space. I’ve hit limits with MessagePack before on embedded projects because you need to know the length of an array upfront. I wished I’d had CBOR instead.

This can also be useful for data processing applications. For example streaming the conversion of a large XML file into a more concise CBOR format would be much more memory efficient. For large scale that’s pretty handy.

> > However, MessagePack sacrifices human-readability > This, of course, applies to CBOR as well.

For the binary format yes. However the CBOR specification defines an official human readable text format for debugging and documentation purposes. It also defines a schema system like json-schema but for CBOR.

Turns out “just some specs” can actually be pretty valuable.

1 comment

elcritch

camgunz 3 days ago

I am really glad you replied.

> Sure, that’s sort of true but missing context. Bormann (and others) wanted to add things such as separate string and byte sequence types. The MessagePack creator refused for years. Fair enough it’s his format. But it frustrated the community dealing with string vs bytes issues.

msgpack-ruby added string support less than a month after cbor-ruby's first commit [0] [1]. The spec was updated over two months before [2]. Awful lot of work if this were really just about strings.

> It also highlights a core philosophical difference of a mostly closed spec vs an extensible first one.

MP has been always been extensible, via ext types.

> That’s just incorrect

I am entirely correct [3].

> MessagePack is designed as a small self mostly closed format.

Isn't it a lot of effort to get an IETF standard changed? Isn't that the benefit of a standard? You keep saying "mostly closed" like it's bad. Data format standards in particular really shouldn't change: who knows how many zettagottabytes there are stored in previous versions?

> It’s debatable how much CBOR is a fork of MPack vs a new format with similarities.

cbor-ruby is literally a fork of msgpack-ruby. The initial commit [0] contains headers like:

    /\*
     \* CBOR for Ruby
     \*
     \* Copyright (C) 2013 Carsten Bormann
     \*
     \*    Licensed under the Apache License, Version 2.0 (the "License").
     \*
     \* Based on:
     \*\*\*\*\**/
    /*
     \* MessagePack for Ruby
     \*
     \* Copyright (C) 2008-2013 Sadayuki Furuhashi

> The resulting binary output is pretty similar with similar benefits

This is the whole game isn't it? The binary output is pretty similar? These are binary output formats!

> but the core theoretical models are pretty different.

I think you're giving a little too much credence to the "theoretical model". It's not more elegant to do what cbor-ruby does [4] vs. what MP does [5] (this is my lib). I literally just use the tag value, or for fixed values I OR them together. The format is designed for you to do this. What's more elegant than a simple, predefined value?

> The whole point of CBOR is that the tags go from 1-8 bytes.

The tags themselves are only 1 byte, until you get to extension types.

> The parser designs end up fairly different due to the different tag formats.

The creator of CBOR disagrees: cbor-ruby was a fork of msgpack-ruby with the tag values changed.

> No one says he ripped off “TLV”.

Don't conflate the general approach with literally forking an existing project.

> You can’t just take a message pack parser and turn it into a CBOR one by changing some values.

This is a strawman. My claim has been about the origins of CBOR, not how one can transmute an MP codec to a CBOR codec.

> I’ve hit limits with MessagePack before on embedded projects because you need to know the length of an array upfront.

When everything's fine, sure this works. If there are any problems whatsoever, you're totally screwed. Any protocol that supports streaming handles this kind of thing. CBOR doesn't. That's bad!

> For example streaming the conversion of a large XML file into a more concise CBOR format would be much more memory efficient.

It's probably faster to feed it through zstd. Also I think you underestimate how involved it'd be to round-trip a rich XML document to/from CBOR/MP.

> However the CBOR specification defines an official human readable text format for debugging and documentation purposes.

Where? Are you talking about Diagnostic Notation [6]? Hmm:

"Note that this truly is a diagnostic format; it is not meant to be parsed. Therefore, no formal definition (as in ABNF) is given in this document. (Implementers looking for a text-based format for representing CBOR data items in configuration files may also want to consider YAML [YAML].)"

YAML!? Anyway, it literally doesn't define it.

[0]: https://github.com/msgpack/msgpack-ruby/commit/60e846aaaa638...

[1]: https://github.com/cabo/cbor-ruby/commit/5aebd764c3a92d40592...

[2]: https://github.com/msgpack/msgpack/commit/5dde8c4fd0010e1435...

[3]: https://github.com/msgpack/msgpack/issues/129#issuecomment-1...

[4]: https://github.com/cabo/cbor-ruby/blob/5aebd764c3a92d4059236...

[5]: https://github.com/camgunz/cmp/blob/master/cmp.c#L30

[6]: https://www.rfc-editor.org/rfc/rfc8949.html#name-diagnostic-...