Comment by pron

2 months ago

That what happens can be reasoned about in the semantics of the source language as opposed to being UB doesn't necessarily make the problem "a ton more benign". After all, a program written in Assembly has no UB and all of its behaviours can be reasoned about in the source language, but I'd hardly trust Assembly programs to be more secure than C programs [1]. What makes the difference isn't that it's UB but, as you pointed out, the type safety. But while the less deterministic nature of a "malloc-level" UAF does make it more "explosive", it can also make it harder to exploit reliably. It's hard to compare the danger of a less likely RCE with a more likely data leak.

On the other hand, the more empirical, though qualitative, claim made by by matklad in the sibling comment may have something to it.

[1]: In fact, take any C program with UB, compile it, and get a dangerous executable. Now disassemble the executable, and you get an equally dangerous program, yet it doesn't have any UB. UB is problematic, of course, partly because at least in C and C++ it can be hard to spot, but it doesn't, in itself, necessarily make a bug more dangerous. If you look at MITRE's top 25 most dangerous software weaknesses, the top four (in the 2025 list) aren't related to UB in any language (by the way, UAF is #7).

4 comments

pron

matklad 2 months ago

>If you look at MITRE's top 25 most dangerous software weaknesses, the top four (in the 2025 list) aren't related to UB in any language (by the way, UAF is #7).

FWIW, I don't find this argument logically sound, in context. This is data aggregated across programming languages, so it could simultaneously be true that, conditioned on using memory unsafe language, you should worry mostly about UB, while, at the same time, UB doesn't matter much in the grand scheme of things, because hardly anyone is using memory-unsafe programming languages.

There were reports from Apple, Google, Microsoft and Mozilla about vulnerabilities in browsers/OS (so, C++ stuff), and I think there UB hovered at between 50% and 80% of all security issues?

And the present discussion does seem overall conditioned on using a manually-memory-managed language :0)

pron 2 months ago

You're right. My point was that there isn't necessarily a connection between UB-ness and danger, and stuck together two separate arguments:
1. In the context of languages that can have OOB and/or UAF, OOB/UAF are very dangerous, but not necessarily because they're UB; they're dangerous because they cause memory corruption. I expect that OOB/UAF are just as dangerous in Assembly, even though they're not UB in Assembly. Conversely, other C/C++ UBs, like signed overflow, aren't nearly as dangerous.
2. Separately from that, I wanted to point out that there are plenty of super-dangerous weaknesses that aren't UB in any language. So some UBs are more dangerous than others and some are less dangerous than non-UB problems. You're right, though, that if more software were written with the possibility of OOB/UAF (whether they're UB or not in the particular language) they would be higher on the list, so the fact that other issues are higher now is not relevant to my point.

kibwen 2 months ago

> In fact, take any C program with UB, compile it, and get a dangerous executable. Now disassemble the executable, and you get an equally dangerous program, yet it doesn't have any UB.

I'd put it like this:

Undefined behavior is a property of an abstract machine. When you write any high-level language with an optimizing compiler, you're writing code against that abstract machine.

The goal of an optimizing compiler for a high-level language is to be "semantics-preserving", such that whatever eventual assembly code that gets spit out at the end of the process guarantees certain behaviors about the runtime behavior of the program.

When you write high-level code that exhibits UB for a given abstract machine, what happens is that the compiler can no longer guarantee that the resulting assembly code is semantics-preserving.

uecker 2 months ago

Since it has UB it is easy for the compiler to guarantee that the resulting code is semantics-preserving: Anything the code does is OK.