Comment by nayuki

2 months ago

> The compiler has no way of knowing that the memory would be undefined

Yes it would. -fsanitize=address does a bunch of instrumentation - it allocates shadow memory to keep track of what main memory is defined, and it checks every read and write address against the shadow memory. It is a combination of compile-time instrumentation and run-time checking. And yes, it is expensive, so it should be used for debugging and not the final release.

https://clang.llvm.org/docs/AddressSanitizer.html , https://learn.microsoft.com/en-us/cpp/sanitizers/asan?view=m...

9 comments

nayuki

bri3d 2 months ago

I tried this with clang ASAN. Nothing happens. It won't catch this bug. ASAN detects the presence of incorrect behavior, not the absence of correct behavior.

There's no use-after-free, use-after-return, use-after-scope, or OOB access here. It's a case of "an allocated stack variable is dynamically read without being initialized only in a runtime case," which afaik no standard analyzer will catch.

The best way to identify this would be to require all locals to be initialized as a matter of policy (very unlikely to fly in a games studio, especially back then, due to the perceived performance overhead) or to debug with a form of stack initialization enabled, like "-ftrivial-auto-var-init=pattern" which while it doesn't catch the issue statically, does make it appear pretty quickly in QA (I tested).

nayuki 2 months ago

Thanks for the investigation. Oops, it seems like MSan (memory sanitizer) is the appropriate tool that detects uninitialized reads? https://stackoverflow.com/questions/68576464/clang-sanitizer...
I only use UBSan and ASan on my own programs because I tend not to make mistakes about initialization. So my knowledge is incomplete with respect to auditing other people's code, which can have different classes of errors than mine.
Thank goodness that every language that is newer than C and C++ doesn't repeat these design mistakes, and doesn't require these awkward sanitizer tools that are introduced decades after the fact.

maccard 2 months ago

This codebase predates ASAN by the best part of a decade.

hoten 2 months ago

You both may be right. It could be that ASAN is not instrumenting scanf (or some other random standard lib function). Though since 2015, it certainly has been. https://github.com/google/sanitizers/issues/108

The simpler policy of "don't allow unintialized locals when declared" would also have caught it with the tools available when the game was made (though a bit ham-fisted).

nayuki 2 months ago
The problem is that after calling scanf(), the number of variables that are defined is a variable number. For example:
int x, y, z; int n = scanf("%d %d %d", &x, &y, &z);
At compile time, you can make no inferences about which of x, y, and z are defined, because that depends on the returned value n. There are many ways to branch out from this.
One is to insist on definite assignment - so if we cannot prove all of them are always assigned, then we can treat them as "possibly undefined" and err out.
Another way is to avoid passing references and instead allow multiple returns, like Python (this is pseudocode):
x, y, z = scanf("%d %d %d")
In that case, if the hypothetical `scanf()` returns a tuple that is less than 3 elements or more than 3 elements, then the unpacking will fail at run time and crash exactly at that line.
Another way is like Java, which insists that the return value is a scalar, so it can't do what C and Python can do. This can be painful on the programmer, of course.
- twic 2 months ago
  
  I interpret "don't allow unintialized locals when declared" as meaning that this call:
  int n = scanf("%d %d %d", &x, &y, &z);
  Would be caught, because it takes references to undeclared variables. To be allowed, the programmer would have to initialize the variables beforehand.
  
  2 replies →
- hoten 2 months ago
  
  The idea is that ASAN would replace scanf with a function that does additional book keeping when writing to whatever arbitrary memory location the inputs dictate at runtime.
  It's probably what the PR resolving the issue I linked to does. Though I didn't check