Comment by Panzerschrek

6 hours ago

Can it handle self-modifying code?

Why only x86_64? It has more sense to convert 32-bit programs, like many old games.

I think self-modifying outside of JIT runtimes is a pretty rare thing these days compared to the 80s or 90s, .text sections are mostly RO these days and security requirements aren't going to decrease that.

Consider reading the linked article, where this is explicitly addressed:

> Self Modifying and JIT-Compiled Code. Elevator, like all fully static binary rewriters, does not support self modifying or just-in-time-compiled code.

  • So they don't have to handle the really hard case.

    In x86 land, it's hard to find the instruction boundaries statically, because, for historical reasons going back to the 8-bit era, x86 nstructions don't have alignment restrictions. This is what makes translation ambiguous.

    If you start at the program entry point and start examining reachable instructions, you can find the instruction boundaries. Debuggers and disassemblers do this. Most of the time, it works, but You may have to recognize things such as C++ vtables. Debug info helps there. There may be ambiguity. This seems to be about generating all the possible code options to resolve that ambiguity by brute force case analysis.

    x86 doesn't have explicit code/data separation, which some architectures do. So they have to try instruction decoding on all data built into the executable. They cull obvious mistranslations. Yet they still have a 50x space expansion, someone mentioned. Most of those will be unreachable mistranslated code.

    You can't look at a static executable which uses pointers to functions and say "that data cannot possibly be code", without constraining what those pointers point to. That involves predicting run-time behavior, which may not be possible.

> Can it handle self-modifying code

If it did, it wouldn't be "fully static" anymore. It's fundamentally contradictory.

On the greenfield x86 development side: Self-modifying code, while possible, is generally terrible because it obliterates cache lines and pipeline branch prediction performance too. And it also violates W^X so it generally has to be used in JIT-compatible memory pages. So avoid it almost always. It was kind of a thing in 486 and P5 days like using code immediates as inner loop variables, but not so much now.

There's a lot of x86 crufty edge-cases to handle to achieve perfect(ish) emulation or translation.

  • > It was kind of a thing in 486 and P5...

    After those machines, at the Pentium Pro, with look-ahead instruction decoding, it became a major lose to store into code. Superscalar x86 CPUs have the hardware to detect and handle stores into code, but it requires bringing the CPU to a clean halt, almost like an exception interrupt, discarding pipelined work that's already been done, and then restarting the pipeline, reloading the instructions ahead. All the performance gains of superscalar hardware is lost for a while.

    There are RISC architectures where self-modifying code isn't supported, and code pages must be read-only. Then the CPU doesn't need the machinery for detecting and aborting look ahead on a store into code. MacOS has enforced that rule since the PowerPC era.