Comment by bythreads

5 months ago

Just a question where you trying to optimize for speed or size?:

I never tried CBOR when looking for a sub 10ms solution for websocket comms, however my use case was not bound by datasize but entirely by speed (network not inet).

However it all came down to a suprising realisation: "compression on both ends is the primary performance culprit"

Optimizing the hell out of the protocol over websockets got me to a fairly ok response time, just using string json and 0 compression blew it out of the water.

So the result was that data load was faster and easier the debug going with strings of json vs any other optimization (messagesize where in the 10-50mb realm)

The amount of shotty ws sever implementations and gzip operations in the communications pipeline is mindblowing - would be interested in hearing how just pure json and zero compression/binary transforms performed :)

11 comments

bythreads

lifthrasiir 5 months ago

Virtually every use case of zlib (can be used to implement gzip) should be replaced with zlib-ng, in fact. The stock zlib is too slow for modern computers. If you have a right workload---no streaming, fit in memory etc.---, then libdeflate is even faster. The compression can't be a bottleneck when you've got a correct library.

nicoburns 5 months ago

zlib-rs (the rust port) is now faster in most cases (which exposes a zlib compatible API)
bythreads 5 months ago

Fun fact: Turning off permessage deflate is nearly impossible when browsers are in the mix, best Option is to strip the header in a reverse proxy, since many browsers ignore the setting and emits the header despite your configs - add to that that most SERVERs assume the client knows what it is doing and adhere to the header request while not allowing a global override to turn it off.
gives you a fun ball of yarn to untangle
bythreads 5 months ago
Patently untrue,
Seriously try to Deflate a 50mb compressed json structure vs just piping it down the wire on a high bandwith connection and try to measure that. (In real life with a server and browser - browsers are super slow at this)
- lifthrasiir 5 months ago
  
  > browsers are super slow at this
  No, DEFLATE is an asymmetric compression algorithm, meaning that decompression is disproportionally faster (at least 100 MB/s in my experience) than compression. It should mostly be a server's fault to use too high and ineffective compression setting or to use an inefficient library like the stock zlib.
masklinn 5 months ago
> The compression can't be a bottleneck when you've got a correct library.
It absolutely can tho. You’re not going to do memory compression using zlib regardless of its flavour.
- lifthrasiir 5 months ago
  
  In this context, of course. It is not a general statement ;-)
- bythreads 5 months ago
  
  ramdisk!

dathinab 5 months ago

I think on important thing to realize is that using CBOR or MessagePack does not involve compression (except if you add it in the same way you do for JSON as another layer).

CBOR and MessagePack are more compact but they do not archive this by compression but instead by adding less noise in-between your data when placing your data on the wire.

E.g. Instead of (in JSON) outputting a " then going through every utf-8 code point and checking if they need escaping and escaping it and then placing another " They place some type tag +length hint and then just memcopy the utf-8 to the wire (assuming they can rely on the input being valid utf-8).

The only thing which goes a bit in the direction of compression is that you can encode a integer as tiny, short or long field. But then that is still way faster then converting it to it's decimal us-ascii representation...

Through that doesn't mean they are guaranteed to always be faster. There are some heavy absurdly optimized JSON libraries using all kind of trickery like SIMD and many "straight forward" implementations of CBOR and MessagePack.

Similar you data might already be in JSON in which case cross encoding it is likely to outweigh and gains.

mrkeen 5 months ago

Is your data already JSON at rest? Because encoding/decoding CBOR should easily beat encoding/decoding JSON.