Comment by cbhl

11 years ago

It's worth noting that XMPP was based on XML. New hotness is JSON, or maybe even compressed binary protocols.

I don't know if you're being ironic about JSON. Note that jkarneges, who comments elsewhere in this thread, is the creator of Psi [1], arguably the best XMPP-focused messaging client.

The value/burden of XML has always been a topic of debate for XMPP. In retrospect, I think it contributed to its lack of appeal, though the extensibility and readbility (ehm, arguably) it provided were unique back then.

I've long wondered about which alternative base protocols could be used in place. JSON is OK, but may be as much a fad as XML. I've wondered if ASN.1 could be used, but ProtoBufs sound like they're a better fit [2] in that they're simpler, more space-efficient, and backwards-and-forwards compatible (and thus extensible, XMPP's main feature) In fact, it's what Google already uses themselves.

[1] http://psi-im.org/ [2] https://groups.google.com/forum/#!topic/protobuf/eNAZlnPKVW4

  • What is the purpose of layering your chat protocol over another protocol at all?

    SMTP has no "base protocol" in this sense. HTTP, nothing (unless you count RFC 822).

    It's hard to think there protocols would have had the same life time if they were based on XML, JSON, or protobufs. (Yeah, HTTP over XML, that should be enough to give you nightmares. But welcome to DAV and XMPP.)

  • If you're looking for a happy medium between the readability of JSON and XML and the efficiency of ASN.1 and protobufs, take a look at canonical S-expressions[1].

    There's an advanced representation, which looks like this: (message (header (sender "Billy Joe Bob") (sent "2015-03-26T12:02:00Z")) (body "Hey guys! Let's meet up for lunch!")). It's possible to encode any byte string using Base64 or hex. It's also possible to encode types with data: (message (header (sender "Billy Joe Bob") (sent "2015-03-26T12:02:00Z")) (body [text/html]"<p>Hey guys! Let's meet up for lunch!</p>"))

    While there are multiple advanced encodings for the same data (e.g. foo or "foo" or |Zm9v| or #666f6f#), there is a _single_ canonical encoding for any datum: the messages above would be (7:message(6:header(6:sender13:Billy Joe Bob)(4:sent20:2015-03-26T12:02:00Z))(4:body35:Hey guys! Let's meet up for lunch!)) and (7:message(6:header(6:sender13:Billy Joe Bob)(4:sent20:2015-03-26T12:02:00Z))(4:body[9:text/html]42:<p>Hey guys! Let's meet up for lunch!</p>)).

    A huge advantage of this canonical encoding is that it's amenable to cryptographic hashing and signing; a weakness of JSON is that one has to layer requirements atop JSON itself (e.g. alphabetising object properties) in order for two parties to be able to hash the same datum and get the same value.

    Another advantage of canonical S-expressions is that it's straightforward to define a mapping between them and HTML: "<p class='foo'>This is a <em>nifty</em> paragraph.<br /></p>" could be represented as ((p (class foo)) "This is a " (em nifty) paragraph. (br)). There are other possible mappings between S-expressions and HTML, of course, but I like that one. Another might be (p (/ (class foo)) "This is a " (em nifty) paragraph. (br)).

    [1] http://people.csail.mit.edu/rivest/Sexp.txt

    • > there is a _single_ canonical encoding for any datum: the messages above would be (7:message(6:header(6:sender13:Billy Joe Bob)(4:sent20:2015-03-26T12:02:00Z))(4:body35:Hey guys! Let's meet up for lunch!))

      This reminds me a lot of bencode, with the advantage for bencode that it doesn't need any fiddling for non-printable characters: no more base64, no more hex.

      1 reply →

  • XML is horrendous. Especially to parse/scrape. JSON on the over hand is a breeze.

    • Only if you don't understand XML.

      * XML has a formal, class-based description language (XML Schema) with strong typing, polymorphism, and - best of all - self-descriptiveness.

      * Languages like Java have a seamless, bidirectional mapping to XML Schema.

      * XML has a rediculously powerful and elegant transformation language (XSLT) which makes scraping, selective data extraction and processing trivial.

      The problem with XML is that people who require instant satisfaction are not willing to invest the time to understand it, and the mature tooling ecosystem around it.

      The XML ecosystem solves problems, and contains solutions to problems, that the JSON / JavaScript ecosystem can only dream of, and is hell-bent on partially re-inventing.

      If you need strong-typing and self-descriptiveness, you're out of luck with JSON. Binding JSON to a strong-typed language like Java or Haskell is a total ball-drag compared to XML + Schema.

  • I don't see why I can't use XML, JSON, MsgPack or YAML.

    Couldn't the parsing be a pluggable component? Just set a standard on how data is structured and let third-parties figure out how data is parsed.

    • And that would improve the XMPP adoption and experience by ... ?

      Are you saying mom and dad aren't using XMPP because the message is sent using XML based stanzas? Facebook is ditching XMPP because of the X?

      It doesn't matter?

      6 replies →

The funny thing is that XMPP was created when XML was the current hotness.

SMTP survived despite changing fads. If we're ever going to standardize IM (or anything), we have to accept that protocols may use older technology. Someday JSON will be old too. Let's not make these mistakes again.

  • At the time when XMPP was getting standardized and was still mostly known as Jabber, I started implementing a Jabber chat client with a friend.

    The problem with XMPP isn't that it was based on XML. No, what made it annoying was that the dudes who made it decided that instead of basing it on exchanging individual messages, i.e. XML documents like everyone else does, everything must instead be put inside a so-called XML stream.

    IIRC it basically meant that the exchange started with a start tag that wasn't terminated until the connection was closed. Since nothing at the time was designed to work with unfinished XML documents (remember the end tag doesn't come until you're done), all the convenient standard XML tools/libraries wouldn't work.

    So I don't think XMPP is a stellar piece of work. But it's of course much better than some proprietary crap, and it's sad to see it lose support, although I imagine to Google and Facebook who both probably couldn't care less about interoperability, having an open XMPP interface is probably more of a liability (spammers, enables people to skip their ads) than something they get much perceived value out of.

    • Even though it seems like a strange decision, using XML for the stream framing made the protocol nicely pure. It theoretically meant you didn't need to write a parser (this was a rare thing for a network protocol). In practice, though, you're right, most parsers at the time didn't work well with network streams.

      Of course, lack of adoption by the big providers was almost certainly political rather than technical.

Maybe we need a new standard JSON protocol, I bet part of it falling out of favor was the XML.

Messaging is one of those areas that is actually pretty simple that corporations that want to own channels have munged up pretty bad into a complex mess. Companies internally even have a couple or few.

Side note: AOL/Timewarner actually owns the IM patent from ICQ (http://edition.cnn.com/2002/TECH/biztech/12/19/internet.aol....)

  • Yeah, I don't think the issue with with the implementation, it's with the fact that it can't be controlled that companies didn't like. We either build systems to use ourselves or let private companies control our communications.