Comment by mawww

4 months ago

C and C++ do have very different memory models, C essentially follows the "types are a way to decode memory" model while C++ has an actual object model where accessing memory using the wrong type is UB and objects have actual lifetimes. Not that this would necessarily lead to performance differences.

When people claim C++ to be faster than C, that is usually understood as C++ provides tools that makes writing fast code easier than C, not that the fastest possible implementation in C++ is faster than the fastest possible implementation in C, which is trivially false as in both cases the fastest possible implementation is the same unmaintainable soup of inline assembly.

The typical example used to claim C++ is faster than C is sorting, where C due to its lack of templates and overloading needs `qsort` to work with void pointers and a pointer to function, making it very hard on the optimiser, when C++'s `std::sort` gets the actual types it works on and can directly inline the comparator, making the optimiser work easier.

Try putting objects into two linked lists in C using sys/queue.h and in C++ using the STL. Try sorting the linked lists. You will find C outperforms C++. That is because C’s data structures are intrusive, such that you do not have external nodes pointing to the objects to cause an extra random memory access. The C++ STL requires an externally allocated node that points to the object in at least one of the data structures, since only 1 container can manage the object lifetimes to be able to concatenate its node with the object as part of the allocation. If you wish to avoid having object lifetimes managed by containers, things will become even slower, because now both data structures will have an extra random memory access for every object. This is not even considering the extra allocations and deallocations needed for the external nodes.

That said, external comparators are a weakness of generic C library functions. I once manually inlined them in some performance critical code using the C preprocessor:

https://github.com/openzfs/zfs/commit/677c6f8457943fe5b56d7a...

  • It seems like your argument is predicated on using the C++ STL. Most people don’t for anything that matters and it is trivial to write alternative implementations that have none of the weaknesses you are arguing. You have created a bit of a strawman.

    One of the strengths of C++ is that it is well-suited to compile-time codegen of hyper-optimized data structures. In fact, that is one of the features that makes it much better than C for performance engineering work.

    • Most C++ code I have seen uses the STL. As for “hyper optimized” data structures, you already have those in C. See the B-Tree code those binary search routine I patched to run faster. Nothing C++ adds improves upon what you can do performance wise in C.

      You have other sources of slow downs in C++, since the abstractions have a tendency to hide bloat, such as excessive dynamic memory usage, use of exceptions and code just outright compiling inefficiently compared to similar code in C. Too much inlining can also be a problem, since it puts pressure on CPU instruction caches.

      6 replies →

  • Unfortunately Stepanov and the STL are widely misunderstood. Stepanov core contributions is the set of concepts underlying the STL and the iterator model for generic programming. The set of algorithms and datastructures in the STL was only supposed to be a beginning, was never supposed to be a finished collection. Unfortunately many, if not most treat it that way.

    But if you look beyond, you can find a whole world that extend the stl. If you are not happy, say, with unordered_map, you can find more or less drop in replacements that use the same iterator based interface, preserve value semantics and use the a common language to describe iterator and reference invalidation.

    Regarding your specific use case, if you want intrusive lists you can use boost.intrusive which provides containers with STL semantics except it leaves ownership of the nodes to the user. The containers do not even need to be lists: you can put the same node in multiple lists linked list, binary trees (multiple flavors), and hash maps (although this is not fully intrusive) at the same time.

    These days I don't generally need boost much, but I still reach for boost.intrusive quite often.

    • I have met a number of people who will not use the boost libraries. It has been so long that I have long forgotten their reasons. My guess is that it had to do with binary compatibility issues.

  • Except, nothing forbids me to use two linked lists in C++ using sys/queue.h, that is exactly one of the reason why Bjarne built C++ on top of C, and also unfortunely a reason why we have security pain points in C++.

    • Yet the C++ community is continually trying to get people to stay away from anything involving C. That said, newer C headers using _Generic for example are not usable from C++.

      4 replies →

In my experience, templates usually cause a lot of bloat that slows things down. Sure, in microbenchmarks it always looks good to specialize everything at compile time, whether this is what you want in a larger project is a different question. And then, also a C compiler can specialize a sort routine for your types just fine. It just needs to be able too look into it, i.e. it does not work for qsort from the libc. I agree to your point that C++ comes with fast implementations of algorithms out-of-the-box. In C you need to assemble a toolbox yourself. But once you have done this, I see no downside.