Comment by nindalf

6 months ago

> And it's not clear if memory safety is the largest source of problems building software today.

The Chromium team found that

> Around 70% of our high severity security bugs are memory unsafety problems (that is, mistakes with C/C++ pointers). Half of those are use-after-free bugs.

Chromium Security: Memory Safety (https://www.chromium.org/Home/chromium-security/memory-safet...)

Microsoft found that

> ~70% of the vulnerabilities Microsoft assigns a CVE each year continue to be memory safety issues

A proactive approach to more secure code (https://msrc.microsoft.com/blog/2019/07/a-proactive-approach...)

It’s possible you hadn’t come across these studies before. But if you have, and you didn’t find them convincing, what did they lack?

- Were the codebases not old enough? They’re anywhere between 15 and 30 years old, so probably not.

- Did the codebases not have enough users? I think both have billions of active users, so I don’t think so.

- Was it a “skill issue”? Are the developers at Google and Microsoft just not that good? Maybe they didn’t consider good design and architecture at any point while writing software over the last couple of decades. Possible!

There’s just one problem with the “skill issue” theory though. Android, presumably staffed with the same calibre of engineers as Chrome, also written in C++ also found that 76% of vulnerabilities were related to memory safety. We’ve got consistency, if nothing else. And then, in recent years, something remarkable happened.

> the percentage of memory safety vulnerabilities in Android dropped from 76% to 24% over 6 years as development shifted to memory safe languages.

Eliminating Memory Safety Vulnerabilities at the Source (https://security.googleblog.com/2024/09/eliminating-memory-s...)

They stopped writing new C++ code and the memory safety vulnerabilities dropped dramatically. Billions of Android users are already benefiting from much more secure devices, today!

You originally said

> And it's not clear if memory safety is the largest source of problems building software today.

It is possible to defend this by saying “what matters in software is product market fit” or something similar. That would be technically correct, while side stepping the issue.

Instead I’ll ask you, do you still think it is possible to write secure software in C++, but just trying a little harder. Through “good design and architecture”, as your previous comment implied.

9 comments

nindalf

logicchains 6 months ago

Two of the biggest use cases for modern C++ are video games and HFT, where memory safety is of absolutely minimal importance (unless you're writing some shitty DRM/anticheat). I work in HFT using modern C++ and bugs related to memory safety are vanishingly rare compared to logic and performance bugs.

imtringued 6 months ago
The importance of memory safety depends on whether your code must accept untrusted inputs or not.
Basically 99% of networked applications that don't talk to a trusted server and all OS level libraries fall under that category.
Your HFT code is most likely not connecting to an exchange that is interested in exploiting your trading code so the exploit surface is quite small. The only potential exploit involves other HFT algorithms trying to craft the order books into a malicious untrusted input to exploit your software.
Meanwhile if you are Google and write an android library, essentially all apps from the play store are out to get you.
Basically C++ code is like an infant that needs to be protected from strangers.
- menaerus 6 months ago
  
  Databases are a perfect example of an open-ended complexity space. SQL is a Turing-complete language and your users are programming their workloads against your database kernel. You (as a developer) know nothing about those workloads nor do you know what your users will want to do next. And you basically have to write the code so that it can virtually support any workload that can possibly exist. It's almost as if you're writing a compiler but with a virtual machine inside of its own OS but with the big difference and which is the ability to scale across millions of users (and data). There's probably not much software like that in the world.
  And yet, no matter how complex database engines really are, my experience has been the same: the number of bugs related to memory-safety were extremely rare.
pxmpxm 6 months ago

Very much this. For some reason people assume that security/exploits are what the below is refering to, as if that's the endgoal that software is trying to solve.
> it's not clear if memory safety is the largest source of problems building software today
sharedptr 6 months ago

Recently interested in HFT. Are there introductory resources that you recommend from an industry point of view?
Books/repositories anything practical

jpc0 6 months ago

> Around 70% of our high severity security bugs are memory unsafety problems

> ~70% of the vulnerabilities Microsoft assigns a CVE

> 76% of vulnerabilities

What is the difference between the first two (emphasis added) and what you said? Just as a thought experiment...

If I measure a single factor in exclusion to all others I can also find whatever I want in any set of data. Now your point may be valid but it is not what they published and without the full dataset we cannot validate your claim however I can validate that what you claim is no what they claim.

To answer your question in the final paragraph. Yes it is, but it requires the same cultural shift as what it would take to write the same code in rust or swift of golang or whatever other memory safe language you want to pick.

If rust was in fact viable for such a large project, how's the servo project going? That still the resounding success it was expected to be? Rust in the kernel? That going well?

The jury is still out on whether rust will be mass adopted and is able to usurp C/C++ in the domains where C/C++ dominate. It may get there, but I would much much rather start a new project using C++20 than in rust and I would still be able to make it memory safe and yes it is a "skill issue", but purely because of legacy C++ being taught and accepted in new code in a codebase.

Rules for writing memory safe C++ has not just been around for decades but has be statically checkable for over a decade but for a large project there are too many errors to universally apply them to existing code without years of work. However if you submit new code using old practices you should be held financially and legally responsible just like an actual engineer in another field would be.

It's because we are lax about standards that it's even an issue.

As a note, if you see an Arc<Mutex<>> in rust outside of some very specific Library code whoever wrote that code probably wouldn't be able to write the same code in a memory and thread safe manner, also that is an architectural issue.

Arc and Mutex are synchronisation primatives that are meant to be used to build datastructures and not in "userspace" code. It's a strong code smell that is generally accepted in Rust. Arc probably shouldn't even need to exist at all because that is a clear indication nobody thought about the ownership semantics of the data in question, maybe for some datastructures it is required but you should very likely not be typing it into general code.

If Arc<Mutex<>> is littered throughout your rust codebase you probably should have written that code in C#/Java/Go/pick your poison...

tsimionescu 6 months ago
This whole concept that code should be architected as "libraries" and "userspace" is such a C++ism.
It's a really weird concept that probably comes only from having this extremely complex language where even the designers expect some parts of it are too weird for "normal programmers". But then they imagine some advanced class of programmer, the "library programmers", who can deal with such complexity.
The more modern way of designing software is to stick to the YAGNI principle: design your code to be simple and straightforward, and only extract out datastructures into separate libraries if and when they prove to be needed.
Not to mention, the position that shared ownership should just not exist at all is self-evidently absurd. The lifetime of an object can very well be a dynamic property of your program, and a concurrent one. A language that lacks std::shared_ptr / Arc is simply not a complete language, there will be algorithms that you just can't express.
- jpc0 6 months ago
  
  So you strongly believe that the programmer should implement .map on arrays and hashmaps etc themselves? Well you will love C code then.
  The point of library code is to implement these things once in a safe and efficient manner and reuse the implementation.
  Sometimes there are more domain or even company specific things that should be implemented exactly once and reused.
  Nobody said there are different tiers of developers like "library developers" and "normal developers". Those are different types of programming that a single developer can do but fundamentally require a different thought pattern. Designing datastructures and algorithms are a lot more CS whereas general programming is much more akin to plumbing. If you think library code isn't needed it's because you overlook the library code you already use.
  There are some things that are not yagni, if you have those in place then the rest of your code can literally be implemented that way because you literally won't need it.
  It's not that shared_ptr isn't needed, it's that people don't use it where necessary, they use it because it's convenient not to think entirely and because the necessary Library code isn't there. I stand strong that seeing std::shared_ptr/box (or even std::unique_ptr/Box) in general code is a code smell, the fact that you even said that there are certain algorithm's that cannot be expressed without it means you agree, the algorithm should be implemented exactly once and reused. If it's only used one then sure it can be abstracted when needed but that doesn't mean you shouldn't need to justify why it's there.

otabdeveloper4 6 months ago

I million times more systems were infiltrated due to PHP SQL injection bugs than were infiltrated via Chromium use-after-free bugs.

Let's keep some sanity and perspective here, please. C++ has many long-standing problems, but banging on the "security" drum will only drive people away from alternative languages. (Everyone knows that "security" is just a fig leaf they use to strong-arm you into doing stuff you hate.)