← Back to context

Comment by lifthrasiir

5 months ago

Not so much. (This response seems to repeat, please refrain from ignoring details in such bold claim.) MP extension types are opaque and applications can do nothing about them. CBOR tags are extensions to the existing data model and can be processed to some extent---for example unknown tags don't prevent implementations from inspecting their contents. And I don't think MP had any sort of registry for extension types, they are more like "reserved for future expansion" instead of "less used but still spec-worthy types to be defined with a reasonable proposal".

> Not so much. (This response seems to repeat, please refrain from ignoring details in such bold claim.)

I said what I meant; feel free to disagree but this policing is pretty condescending. I don't need to constantly repeat that the fundamental data format is lifted from MP and the extra features/process Bormann added on top of it are uniformly poorly thought out.

Just like CBOR's tags, extension types are additional, non core types. Bormann renamed them and bumped up the size so you can have way more of them in CBOR, but the tag also takes up more space, and since the odds of needing billions of extension types are basically zero it's not a good tradeoff.

> MP extension types are opaque and applications can do nothing about them. CBOR tags are extensions to the existing data model and can be processed to some extent

I think CBOR's "have your cake and eat it too" design has confused you. Yes CBOR establishes a tag registry, but implementations are free to ignore all tags. In practice what this means is if you can control the receiver you can use whatever tags you want, and if you don't control the receiver you have to either avoid tags or potentially limit your audience (i.e. do I use the "Standard date/time string" and eat the extra int8 or do I just send it as a string and note it as a date/time in my docs). You might think, "oh pish posh what can't process a date/time string", but the answer is "many embedded devices you'd want to use CBOR on". It's yet another feature added with no consideration for real world use cases.

> for example unknown tags don't prevent implementations from inspecting their contents. And I don't think MP had any sort of registry for extension types, they are more like "reserved for future expansion" instead of "less used but still spec-worthy types to be defined with a reasonable proposal".

You fundamentally misunderstand MP's extension types. Instead of guessing you can read about them in the MP spec [0]:

---

Extension types

MessagePack allows applications to define application-specific types using the Extension type. Extension type consists of an integer and a byte array where the integer represents a kind of types and the byte array represents data.

Applications can assign 0 to 127 to store application-specific type information. An example usage is that application defines type = 0 as the application's unique type system, and stores name of a type and values of the type at the payload.

MessagePack reserves -1 to -128 for future extension to add predefined types. These types will be added to exchange more types without using pre-shared statically-typed schema across different programming environments.

[0, 127]: application-specific types

[-128, -1]: reserved for predefined types

Because extension types are intended to be added, old applications may not implement all of them. However, they can still handle such type as one of Extension types. Therefore, applications can decide whether they reject unknown Extension types, accept as opaque data, or transfer to another application without touching payload of them.

---

[0]: https://github.com/msgpack/msgpack/blob/master/spec.md#exten...

  • > I said what I meant; feel free to disagree but this policing is pretty condescending.

    Sorry for that feeling, but when the same thing repeats three times (I think) I have to note that something is off in your messaging. I'll try to be more cautious in the future.

    > You fundamentally misunderstand MP's extension types. Instead of guessing you can read about them in the MP spec [0]:

    Maybe my line of thought is confusing to you, but I have read all of that in order to avoid relying on my fragile recollection. And they are qualitatively different to me. You can't really do that much with an encoded bytes `c7 05 00 94 01 02 03 04` if the application-specific type 0 is unknown, even though `94 01 02 03 04` is a valid MP sequence and the author probably have intended so. So tag-unaware tools like diagnostics or compression algorithms would have to guess. The equivalent CBOR bytes `c0 84 01 02 03 04` clearly express such intent. If there is no such intent, you can put a byte string instead (`c0 45 84 01 02 03 04`).

    As you have acknowledged, the tag registry has its pros and cons. It might not be obvious which tag should be used in a given use case. Tags are prone to be ill-designed and stuck forever (this already happened for IPv4/v6 tags, to be clear). But the registry means that the spec development can happen in the distributed manner and for more specific situations. I mean, the only extension type ever defined by MP is a timestamp. It even doesn't have other obvious tags like UUID. Is it justified?

    • The registry isn't useful for this. Either you're defining a format to be consumed by a generic decoder and therefore can't rely on tags in the registry, or you're defining a format to be consumed by a custom decoder you control, so it can understand whatever tags/extension types you make it understand. The registry is strictly a negative because--again--you can't rely on it, and it requires extensions to go through the registration process. You can't define application-specific types in CBOR.

      > It even doesn't have other obvious tags like UUID. Is it justified?

      Yes; UUIDs are huge 128-bit values and many popular embedded platforms are 32-bit. If your app needs them in MP that's what extension types are for.

      ---

      I think maybe what makes us talk past each other is: there's no use-case for a generic CBOR (or MP) decoder on its own. JSON/XML/HTML won in that space (you know things are bad when there are more public XML APIs than public CBOR APIs). There's no serious use-case for a "tag-unaware diagnostic" tool for CBOR or MP APIs. You will always build things on top of the CBOR/MP decoder, there will always be API docs, or reverse engineering the wire format is trivial. CBOR really wants this to not be true; it really wants to be the binary JSON despite the fact this is more or less an oxymoron. The questions that illustrate the difference are:

      - how does the format avoid forcing things on you you don't need

      - how does the format provide for extension

      MP's answers to these questions are:

      - be very conservative about what's required of implementations

      - extension types

      CBOR's answers to these questions are:

      - interact with IETF

      - interact with IETF

      Different people will react to that differently, but that's the bottom line.

      2 replies →