Comment by Negitivefrags
11 hours ago
Why is it such a terrible idea?
No need to add complexity, dependancies and reduced performance by using these libraries.
11 hours ago
Why is it such a terrible idea?
No need to add complexity, dependancies and reduced performance by using these libraries.
Lots of reasons:
The code is not portable between architectures.
You can’t actually define your data structure. You can pretend with your compiler’s version of “pack” with regrettable results.
You probably have multiple kinds of undefined behavior.
Dealing with compatibility between versions of your software is awkward at best.
You might not even get amazing performance. mmap is not a panacea. Page faults and TLB flushing are not free.
You can’t use any sort of advanced data types — you get exactly what C gives you.
Forget about enforcing any sort of invariant at the language level.
I've written a lot of code using that method, and never had any portability issues. You use types with number of bits in them.
Hell, I've slung C structs across the network between 3 CPU architectures. And I didn't even use htons!
Maybe it's not portable to some ancient architecture, but none that I have experienced.
If there is undefined behavior, it's certainly never been a problem either.
And I've seen a lot of talk about TLB shootdown, so I tried to reproduce those problems but even with over 32 threads, mmap was still faster than fread into memory in the tests I ran.
Look, obviously there are use cases for libraries like that, but a lot of the time you just need something simple, and writing some structs to disk can go a long way.
Some people also don't use protective gear when going downhill biking, it is a matter of feeling lucky.
2 replies →
C allows most of this, whereas C++ doesn't allow pointer aliasing without a compiler flag, tricks and problems.
I agree you can certainly just use bytes of the correct sizes, but often to get the coverage you need for the data structure you end up writing some form of wrapper or fixup code, which is still easier and gives you the control versus most of the protobuf like stuff that introduces a lot of complexity and tons of code.
8 replies →
That seems highly unlikely. Let's assume that all compilers use the exact same padding in C structs, that all architectures use the same alignment, and that endianness is made up, that types are the same size across 64 and 32 bit platforms, and also pretend that pointers inside a struct will work fine when sent across the network; the question remains still: Why? Is THIS your bottleneck? Will a couple memcpy() operations that are likely no-op if your structs happen to line up kill your perf?
No defined binary encoding, no guarantee about concurrent modifications, performance trade-offs (mmap is NOT always faster than sequential reads!) and more.
Doesn't that just describe low level file IO in general?
Because a struct might not serialize the same way from a CPU architecture to another.
The sizes of ints, the byte order and the padding can be different for instance.
C has had fixed size int types since C99. And you've always been able to define struct layouts with perfect precision (struct padding is well defined and deterministic, and you can always use __attribute__(packed) and bit fields for manual padding).
Endianness might kill your portability in theory. but in practice, nobody uses big endian anymore. Unless you're shipping software for an IBM mainframe, little endian is portable.
You just define the structures in terms of some e.g. uint32_le etc types for which you provide conversion functions to native endianness. On a little endian platform the conversion is a no-op.
It can be made to work (as you point out), and the core idea is great, but the implementation is terrible. You have to stop and think about struct layout rules rather than declaring your intent and having the compiler check for errors. As usual C is a giant pile of exquisitely crafted footguns.
A "sane" version of the feature would provide for marking a struct as intended for ser/des at which point you'd be required to spell out every last alignment, endianness, and bit width detail. (You'd still have to remember to mark any structs used in conjunction with mmap but C wouldn't be any fun if it was safe.)