Comment by amenghra

1 day ago

IMHO, if something isn’t part of the contract, it should be randomized. Eg if iteration order of maps isn’t guaranteed in your language, then your language should go out of its way to randomize it. Otherwise, you end up with brittle code: code that works fine until it doesn’t.

There are various compiler options like -ftrivial-auto-var-init to initialize uninitialized variables to specific (or random) values in some situations, but overall, randomizing (or zeroing) the full content of the stack in each function call would be a horrendous performance regression and isn't done for this reason.

  • There are fast instructions (e.g., REP STOSx, AVX zero stores, dc zva) and tricks (MTE, zero pages), but no magic CPU instruction exists that transparently and efficiently randomizes or zeros the stack on function calls. You think there would be one and I bet there are on some specialized high-security systems, but I'm not sure even where you would find such a product. Telecom certainly isn't it.

    • There are proposed cpu architectures that work that way, like the Mill <https://millcomputing.com/>. Where most cpus support multiple calling conventions the Mill enforces a single calling convention in hardware. There is a hardware `call` instruction that does all the work directly, along with a corresponding `ret` instruction for returning from a function call. It also uses its equivalent of the TLB to ensure that each function is only granted permission to read from that portion of the stack which contains its arguments; any attempt to read outside that region would result in a permission error that causes the read to return a NaR (Not a Result, akin to a floating point NaN).

      As an additional protection, new stack frames are implicitly zeroed as they are created. I assume this is done by filling the CPU cache with zeros for those addresses before continuing to execute the called function. No need to wait for actual zeros to be written to main memory.

      https://millcomputing.com/wiki/Protection#Protecting_Stacks

      1 reply →

    • You couldn't do random, but with a predictable performance hit to memory, cache and write-line use stack addresses COULD be isolated for a program, for a library, etc.

      It'd be expensive though; every context switch would require it's own stack and pushing / restoring one more register. There's GOOD reason programs don't work that way and are supposed to not rely on values outside of properly initialized (and not later clobbered) memory.

      2 replies →

    • CPUs already special case xor reg,reg as zeroing out the register, breaking any data dependency on it. If zeroing bits of the stack were common enough, I'd believe CPUs could be made that handled it efficiently (they already special case the stack; push/pop)

  • I'm a bit distant from this stuff, but it looks like C++26 will have something like -ftrivial-auto-var-init enabled by default. See the "safe by default" section of [1].

    For reference, the actual proposal that was accepted into C++26 is [2]. It discusses performance only in general, and it refers to an earlier analysis [3] for more details. This last reference describes regressions of around 0.5% in time and in code size. Earlier prototypes suggested larger regressions (perhaps even "horrendous") but more emphasis on compiler optimizations has brought the regression down considerably.

    Of course one's mileage may vary, and one might also consider a 0.5% regression unacceptable. However, the C++ committee seems to have considered this to be an acceptable tradeoff to remove a frequent cause of undefined behavior from C++.

    [1]: https://herbsutter.com/2024/08/07/reader-qa-what-does-it-mea...

    [2]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p27...

    [3]: https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2723r1...

  • Microsoft's Visual C++ compiler has the /Ge compiler option ( see https://learn.microsoft.com/en-us/cpp/build/reference/ge-ena... ) Deprecated since VC2005.

    This compiler option causes the compiler to emit a call to a stack probe function to ensure that a sufficient amount of stack space is available.

    Rather than just probe once for each stack page used, you can substitute a function that *FILLS* the stack frame with a particular value - something like 0xBAADF00D - one could set the value to anything you wanted at runtime.

    This would get you similar behaviour to gcc/clang's -ftrivial-auto-var-init

    Windows has started to auto-initialize most stack variables in the Windows kernel and several other areas.

        The following types are automatically initialized:
        
            Scalars (arrays, pointers, floats)
            Arrays of pointers
            Structures (plain-old-data structures)
        
        The following are not automatically initialized:
        
            Volatile variables
            Arrays of anything other than pointers (i.e. array of int, array of structures, etc.)
            Classes that are not plain-old-data
    
    
        During initial testing where we forcibly initialized all types of data on the stack we saw performance regressions of over 10% in several key scenarios.
    
        With POD structures only, performance was more reasonable. Compiler optimizations to eliminate redundant stores (both inside basic blocks and between basic blocks) were able to further drop the regression caused by POD structures from observable to noise-level for most tests.
    
        We plan on revisiting zero initializing all types (especially now that our optimizer has more powerful optimizations), we just haven’t gotten to it yet.
    

    see https://web.archive.org/web/20200518153645/https://msrc-blog...

Randomization at this level would be too expensive. There are tools that do this for debug purposes, and your stuff runs a lot slower in that mode.

  • I had to Google to find the tid bit that I read about Perl years ago. I think this will affect iteration order of dicts.

        > Nov 22, 2012 — Perl 5.18 will introduce per process hash randomization and almost certainly will feature a new hash function.

  • it probably shouldn’t be a “release” thing. actually, certainly. i do wonder how many bugs would never have seen the light of day, if someone’s “set” actually turned out to be a sequence (i.e. allowed duplicate values) resulting in a debug build raising an assert.

    • Debug builds are worthless for catching issues. How many people actually run them? Perhaps developers run debug builds of individual binaries they're working on when they're trying to repro a bug, but my experience at every company of every size and position in the stack (including the Windows team) is that no one does their general purpose use on a debug build.

      2 replies →

Regarding contracts, there's an additional lesson here, quoting from the source:

> This is an interesting lesson in compatibility: even changes to the stack layout of the internal implementations can have compatibility implications if an application is bugged and unintentionally relies on a specific behavior.

I suppose this is why Linux kernel maintainers insist on never breaking user space.

Nope. You have to remember https://www.hyrumslaw.com/

  With a sufficient number of users of an API,
  it does not matter what you promise in the contract:
  all observable behaviors of your system
  will be depended on by somebody.

If you promise randomization, then somebody will depend on that :)

And then you can never remove it!

  • Semi-related: this type of thing is actually covered in the Site Reliability Engineering book by Google. They highlighted a case of a system that outperformed its SLO, so people depended on it having 100% uptime. They "fixed" this by injecting errors to go closer to their SLA, forcing downstream engineers to deal with the fact that the dependent services would sometimes fail for no reason.

    I know it's easier said than done everywhere, just found it to be an interesting parallel.

one might argue that one of the advantages of languages like C is that you only pay for the features you choose to use, no unnecessary overhead like initializing unused variables

  • You can pay for those features in debug mode or in chaos monkey mode. It's okay to continue to not pay for them in release mode. Heck, Rust has this approach when it comes to handling integer overflow - fully checked in debug mode, silent wraparound in release mode.

    • In Ada you can pay for integer overflow checks (runtime) if you want to. With Ada SPARK you can prove that your code does not contain integer overflows so that you don't need runtime checks.

      1 reply →

  • However, the compiler does not tell you this. We're back to the problem that it's possible to have a "working" C program that relies on UB and will therefore break at some point, but the tools will not yell at you for doing this. Whereas in Java or C# you get warnings or errors for using maybe-uninitialized variables.

    Also, scanf should be deprecated. Terrible API. Never use scanf or sscanf etc. We managed to get "gets()" deprecated, time to spread that to other parts of the API.

    atoi() or atof() etc. work OK, but really you need a parser.

I agree, this can also detect brittle tests (e.g, test methods/classes that only pass if executed in a particular order). But applying it for all data could be expensive computation-wise

Not really the ethos of C(++), though of course this particular bug would be easily caught by running a debug build (even 20 years ago). However, this being a game "true" debug builds were probably too slow to be usable. That was at least my experience doing gamedev in that timeframe. Then again code holding up for 20 years in that line of biz is more than sufficient anyway :)

  • When I was doing gamedev about 5 years ago, we were still debugging with optimisation on. You get a class of bugs just from running in lower frame rates that don't happen in release.

I once updated a little shy of 1mloc of Perl 5.8 code to run on Perl 5.32 (ish). There were, overall, remarkably few issues that cropped up. One of these issues (that showed itself a few times) was more or less exactly this: the iteration order through a hash is not defined. It has never been defined, but in Perl 5.8 it was consistent: for the same insertion order of the same set of keys, a hash would always iterate in the same way. In a later Perl it was deliberately randomised, not just once, but on every iteration through the hash.

It turned out there a few places that had assumed a predictable - not just stable, but deterministic - hash key iteration order. Mostly this showed up as tests that failed 50% of the time, which suggested to me a rough measure of how annoying an error is to track down is inversely correlated with how often the error appears in tests.

(Other issues were mostly due to the fact that Perl 5 is all but abandoned by its former community: a few CPAN modules are just gone, some are so far out of date that they can't be coerced to still work with other modules that have been updated over time. )

Then you are wasting runtime clock cycles randomizing lists.

  • Not necessarily; you can do a thing where it's randomized during development, testing and fuzzing but not in production builds or benchmarks so that the obvious "I rely on internal map order" bugs are spotted right away.

  • You can get it pretty much for free by using a random salt with your hash function. This is also useful for avoiding DOS attacks using deliberate hash collisions to trigger quadratic behavior in your hash tables.

  • Any sane language would design a list iterator to follow the order of the list. No, the difference is when you're iterating over orderless hash-based sets or maps/dictionaries. Many languages choose to leave the iteration order undefined. I think Python did that up to a point, but afterward they defined dictionaries (but not sets) to be iterated over in the order that keys were added. Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.

    • > Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.

      Most modern langages do that as part of hashdos mitigation, Python did that until it switched to a naturally ordered hashmap, then made insertion order part of the spec. Importantly iteration order remains consistent with a process (possibly on a per-hashmap basis).

      Notably, Go will randomise the starting point of hashmap iteration on each iteration.