← Back to context

Comment by amluto

1 year ago

Plain int3 is a footgun: the CPU does not keep track of the address of the int3 (at least not until FRED), and it reports the address after int3. It’s impossible to reliably undo that in software, and most debuggers don’t even try, and the result is a failure to identify the location of the breakpoint. It’s problematic if the int3 is the last instruction in a basic block, and even worse if the optimizer thinks that whatever is after the int3 is unreachable.

If Rust’s standard library does this, please consider using int3;nop instead.

Good to know! I've seen the pattern of "int3; nop" before, but I've never seen the explanation for why. I'd always assumed it involved the desire to be able to live-patch a different instruction over it.

In Rust, we're using the `llvm.debugtrap` intrinsic. Does that DTRT?

The "canonical" INT 3 is a single byte opcode (CCh), so the debugger can just subtract 1 from the address pushed on the stack to get the breakpoint location.

There is another encoding (CD 03), but no assembler should emit it. It used to be possible for adversarial code to confuse debug interrupt handlers with this, but this should be fixed now.

  • This would involve the debugger actually being structured in a way that makes this make sense. A debugger like GCC has a gnarly data structure that represents the machine state, and it contains things like EIP/RIP. There is a command 'backtrace' that takes the machine state and attempts to generate a backtrace. And there's a command 'continue' that resumes execution.

    int3 is a "trap". continue will resume execution at the instruction after int3, as intended. But backtrace should, by some ill-defined magic, generate the backtrace as though RIP was (saved RIP - 1). And the condition for doing this isn't something that is (AFAIK) representable at all in GCC's worldview. Sure, GCC knows, or at least ought to know [0], that it gained control because of vector 3, and the Intel and AMD manuals say that vector 3 is a trap. But there isn't a bit in memory or anything you would see in 'info regs' that will say "hey, this is a 'trap', and backtraces and such should be done as though RIP was actually RIP-1".

    Maybe the right solution would be to split the program counter, from the perspective of the debugger, into two fields: program counter for backtracing, and program counter for resumption.

    And yes, I know that GCC gets this wrong. Been there, seen the failures. I have not checked, but I expect that LLDB works exactly like GCC in this regard.

    [0] ptrace on Linux exposes the vector number, somewhat awkwardly. Or you can infer it from the fact that the signal was SIGTRAP.