← Back to context

Comment by ack_complete

4 days ago

There's an annoying corner case when using SetWindowLongPtr/GetWindowLongPtr() -- Windows sends WM_GETMINMAXINFO before WM_NCCREATE. This can be worked around with a thread local, but a trampoline inherently handles it. Trampolines are also useful for other Win32 user functions that don't have an easy way to store context data, such as SetWindowsHookEx(). They're also slightly faster, though GetWindowLongPtr() at least seems able to avoid a syscall.

The code as written, though, is missing a call to FlushInstructionCache() and might not work in processes that prohibit dynamic code generation. An alternative is to just pregenerate an array of trampolines in a code segment, each referencing a mutable pointer in a parallel array in the data segment. These can be generated straightforwardly with a little template magic. This adds size to the executable unlike an empty RWX segment, but doesn't run afoul of any dynamic codegen restrictions or require I-cache flushing. The number of trampolines must be predetermined, but the RWX segment has the same limitation.

I wasn't aware of the thread local trick, I solve this problem by not setting WS_VISIBLE and calling SetWindowPos & ShowWindow after CreateWindow returns (this solves some other problems as well..)

FlushInstructionCache isn't needed on x86_64. I-cache and D-cache are coherent.

  • I'm not convinced this is always guaranteed for a Windows x64 program. When running on bare x64 hardware, FlushInstructionCache() does seem to be an (inefficient) noop on Windows 11 x64, but when running in emulation on Windows 11 ARM64, it's running a significantly larger amount of ARM64 native code -- it looks like it might be ensuring that stale JIT code is flushed.