This is really interesting, and I'm surprised that I had never looked at JIT compiling as self-modifying code (SMC). Also that I had never heard of copy-and-patch.
There are whole classes of problems that can be more easily solved with SMC. That's part of what got me into FPGAs back in the 90s, before I abandoned them due to their lack of exponential growth and proprietary placement and routing tools.
This could have implications for faster in-app scripting like in games. Also for building more powerful shaders. I wonder if there are analogs of the article's mprotect(ret, 256, PROT_READ | PROT_EXEC) calls for GPUs.
This blog goes from 0 to 100 really, really quickly. I have no idea what I am looking. I suppose it is not meant for beginners but it claims to be a tutorial.
This is, but only for someone who wants to do JIT work without writing assembly code, but can read assembly code back into C (or can automate that part).
Instead of doing all manual register allocations in the JIT, you get to fill in the blanks with the actual inputs after a more (maybe) diligent compiler has allocated the registers, pushed them and all that.
There's a similar set of implementation techniques in Apache Impala, where the JIT only invokes the library functions when generating JIT code, instead of writing inline JIT operations, so that they can rely on shorter compile times for the JIT and deeper optimization passes for the called functions.
Copy-and-patch is a technique for reducing the amount of effort it takes to write a JIT by leaning on an existing AOT compiler's code generator. Instead of generating machine code yourself, you can get LLVM (or another compiler) to generate a small snippet of code for each operation in your internal IR. Then codegen is simply a matter of copying the precompiled snippet and patching up the references.
The more resources are poured into a JIT, the less it is likely to use copy-and-patch. You get more control/flexibility doing codegen yourself.
My understanding is that e-graphs take care of selecting the best patch (by examining many options in parallel) but fundamentally it is still copy-and-patch.
Question: For what else (apart from assembler) this could be a good idea?
I think WASM, but could be for a custom byte code? and more importantly, for a set of host-native functions (like I make some rust functions that somehow exploit this idea?)
This is really interesting, and I'm surprised that I had never looked at JIT compiling as self-modifying code (SMC). Also that I had never heard of copy-and-patch.
There are whole classes of problems that can be more easily solved with SMC. That's part of what got me into FPGAs back in the 90s, before I abandoned them due to their lack of exponential growth and proprietary placement and routing tools.
This could have implications for faster in-app scripting like in games. Also for building more powerful shaders. I wonder if there are analogs of the article's mprotect(ret, 256, PROT_READ | PROT_EXEC) calls for GPUs.
Related:
Copy-and-Patch: Fast compilation for high-level languages and bytecode (2020) https://news.ycombinator.com/item?id=28547057 - Sept 2021 (7 comments)
This blog goes from 0 to 100 really, really quickly. I have no idea what I am looking. I suppose it is not meant for beginners but it claims to be a tutorial.
> but it claims to be a tutorial.
This is, but only for someone who wants to do JIT work without writing assembly code, but can read assembly code back into C (or can automate that part).
Instead of doing all manual register allocations in the JIT, you get to fill in the blanks with the actual inputs after a more (maybe) diligent compiler has allocated the registers, pushed them and all that.
There's a similar set of implementation techniques in Apache Impala, where the JIT only invokes the library functions when generating JIT code, instead of writing inline JIT operations, so that they can rely on shorter compile times for the JIT and deeper optimization passes for the called functions.
https://www.reddit.com/r/restofthefuckingowl/
There are some experiments in using copy-and-patch for the R language (after Python): https://dl.acm.org/doi/10.1145/3759548.3763370
From a master thesis: https://www.itspy.cz/wp-content/uploads/2025/09/it_spy_2025_...
Featuring self-modifying code - it can repatch emitted instruction at runtime based on the current value type.
the accompanying post "How It Works" is worth reading alongside this tutorial
https://transactional.blog/copy-and-patch/
(key terms: abus[e|ing]: 4, force: 3, trick: 1, chance: 1)
I think this technique also lies at the heart of the Cranelift project.
https://cranelift.dev/
Cranelift does not use copy-and-patch. Consider, for example, this file, which implements part of the instruction generation logic for x64: https://github.com/bytecodealliance/wasmtime/blob/main/crane...
Copy-and-patch is a technique for reducing the amount of effort it takes to write a JIT by leaning on an existing AOT compiler's code generator. Instead of generating machine code yourself, you can get LLVM (or another compiler) to generate a small snippet of code for each operation in your internal IR. Then codegen is simply a matter of copying the precompiled snippet and patching up the references.
The more resources are poured into a JIT, the less it is likely to use copy-and-patch. You get more control/flexibility doing codegen yourself.
But see also Deegen for a pretty cool example of trying to push this approach as far as possible: https://aha.stanford.edu/deegen-meta-compiler-approach-high-...
IIRC Cranelift doesn't use copy-and-patch. It uses e-graphs [0] as part of its optimization pipeline, though.
Closest thing in (relatively) recent news that uses copy-and-patch I can think of is CPython's new JIT.
[0]: https://github.com/bytecodealliance/rfcs/pull/27
My understanding is that e-graphs take care of selecting the best patch (by examining many options in parallel) but fundamentally it is still copy-and-patch.
5 replies →
Question: For what else (apart from assembler) this could be a good idea?
I think WASM, but could be for a custom byte code? and more importantly, for a set of host-native functions (like I make some rust functions that somehow exploit this idea?)