Comment by naikrovek

4 days ago

people are just straight up afraid to write their own binary formats, aren't they.

it's not hard, it's exactly like creating your own text format but you write binary data instead of text, and you can't read it with your eyes right away (but you can after you've looked at enough of it.) there is nothing to fear or to even worry about; just try it. look up how things like TLV work on wikipedia. you can do just about anything you would ever need with plain binary TLV and it's gonna perform like you wouldn't believe.

https://en.wikipedia.org/wiki/Type%E2%80%93length%E2%80%93va...

binary formats are always going to be 1-2 orders of magnitude faster than plain text formats, no matter which plain text format you're using. writing a viewer so you can easily read the data isn't zero-effort like it is for JSON or XML where any existing text editor will do, but it's not exactly hard, either. your binary format reading code is the core of what that viewer would be.

once you write and use your own binary format, existing binary formats you come across become a lot less opaque, and it starts to feel like you're developing a mild superpower.

7 comments

naikrovek

markisus 4 days ago

CBOR has some stuff that is nice but would be annoying to reimplement. Like using more bytes to store large numbers than small ones. If you need a quick multipurpose binary format, CBOR is pretty good. The only alternative I’d make manually is just memcpy the bytes of a C struct directly to disk and hope that I won’t encounter a system with different endianness.

neutrinobro 4 days ago
These days you don't have to worry about endianness much (unless you dealing with raw network packets). However, you do need to worry about byte-padding. Different compilers/systems will place byte padding between items in your struct differently (depending on the contents and ordering of items), and if you are not careful the in-memory or on-disk placement of struct data elements can be misaligned on different systems. Most systems align to a 8-byte boundary, but that isn't guaranteed.
- markisus 4 days ago
  
  Yeah I try to make sure I do the extern c. I’m also on x86 so I just pretend that alignment is not an issue and I think it works.

hvb2 4 days ago

I assume you mean as an exercise? Not for actual use in any production system?

If you did mean for production use, I assume you also implement your own encryption, encoding schemes and everything else?

naikrovek 4 days ago
i write my own binary formats because they're fast and small. yes, in production. partly because it's just as easy as anything else for me now, partly because it doesn't require any dependencies at all, and partly to show others just how easy it is, because i think people are unnecessarily afraid of this.
no i don't write my own encoding or encryption.
why the hell would anyone use json for everything, and why would someone who doesn't do that earn your derision?
- hvb2 4 days ago
  
  I didn't say anywhere that we should use json for everything.
  I think most people would go with something standard and documented. If you work in a team it helps if you can hire people that are familiar with tech or can read up on it easily.
  And in general, unless you can show that your formatter is an actual hot path in need of optimization, you've just added another piece of code in need of care and feeding for no real gain.
  Most devs/applications are fine with protobuf or even Json performance. And solving that problem is not something they can or should do.
  If you write something like that just to prove a point, good for you. Also I would never want to be on the same team
  
  1 reply →