← Back to context

Comment by shakna

1 year ago

I started building a Forth recently, but decided that instead of interpreter or transpiler or whatever, I'd map to bytes in memory and just straight execute them.

This non-optimising JIT has been far, far easier than all the scary articles and comments I've seen led me to believe.

I'm already in the middle of making it work on both Aarch64 and RISC-V, a couple weeks in.

We did a similar approach back in the day, when going through the Tiger language[0], on the Java version.

Our approach was to model the compiler IR into Assembly macros, and follow the classical UNIX compiler build pipeline, thus even though it wasn't the most performant compiler in the world, we could nonetheless enjoy having our toy compiler generate real executables in the end.

[0] - https://www.cs.princeton.edu/~appel/modern

I did this for WebAssembly WAT (an IR that is syntactically similar to lisp) by mapping the AST for my lisp more or less directly to the WAT IR, then emitting the bytecode from there. It was pretty fun.

I mean, it’s not hard as such, the encodings of some instruction sets are just ass, with 32- and 64-bit x86 as the foremost example and Thumb-2 not far behind it. Also, if you’re dynamically patching existing code, you’ll have to contend with both modern OSes (especially “hardening” patches thereto) making your life harder in bespoke incompatible ways (see: most of libffi) and modern CPUs being very buggy around self-modifying code. Other than that, it just takes quite a bit of tedious but straightforward work to get anywhere.

  • I haven't had any issues with the OS.

    I mmap, insert, mark as executable and done. Patchjumping and everything "just works".

    I'm not modifying my own process, so there's no hardening issues. Just modifying an anonymous memory map.

Very interesting, care to share the source?

  • Oh, it's still a while off that. I do plan to make it public at some point, but when I'm actually happy the code isn't completely vomit.

    But for a simple taste, the push to stack function currently looks like this. (All the emit stuff just writes bytes into a mmap that gets executed later.)

        void compile_push_literal(Value val) {
        #if ARCH_X86_64
            emit_bytes((uint8_t[]){X86_MOV_RDI_IMM64_0, X86_MOV_RDI_IMM64_1}, 2); emit_uint64_le(val);
            emit_bytes((uint8_t[]){X86_MOV_RAX_IMM64_0, X86_MOV_RAX_IMM64_1}, 2); emit_uint64_le((uint64_t)push);
            emit_bytes((uint8_t[]){X86_CALL_RAX_0, X86_CALL_RAX_1}, 2);
        #elif ARCH_ARM64
            uint64_t imm = val;
            emit_uint32_le(ARM64_MOVZ_OP | (ARM64_REG_X0 << 0) | ((imm & 0xFFFF) << 5));
            emit_uint32_le(ARM64_MOVK_OP_LSL16 | (ARM64_REG_X0 << 0) | (((imm >> 16) & 0xFFFF) << 5));
            emit_uint32_le(ARM64_MOVK_OP_LSL32 | (ARM64_REG_X0 << 0) | (((imm >> 32) & 0xFFFF) << 5));
            emit_uint32_le(ARM64_MOVK_OP_LSL48 | (ARM64_REG_X0 << 0) | (((imm >> 48) & 0xFFFF) << 5));
            uint64_t func_addr = (uint64_t)push;
            emit_uint32_le(ARM64_MOVZ_OP | (ARM64_REG_X1 << 0) | ((func_addr & 0xFFFF) << 5));
            emit_uint32_le(ARM64_MOVK_OP_LSL16 | (ARM64_REG_X1 << 0) | (((func_addr >> 16) & 0xFFFF) << 5));
            emit_uint32_le(ARM64_MOVK_OP_LSL32 | (ARM64_REG_X1 << 0) | (((func_addr >> 32) & 0xFFFF) << 5));
            emit_uint32_le(ARM64_MOVK_OP_LSL48 | (ARM64_REG_X1 << 0) | (((func_addr >> 48) & 0xFFFF) << 5));
            emit_uint32_le(ARM64_BLR_OP | (ARM64_REG_X1 << 5));
        #elif ARCH_RISCV64
            emit_load_imm_riscv(val, RISCV_REG_A0, RISCV_REG_T1);
            emit_load_imm_riscv((uint64_t)push, RISCV_REG_T0, RISCV_REG_T1);
            emit_uint32_le((0 << 20) | (RISCV_REG_T0 << 15) | (RISCV_F3_JALR << 12) | (RISCV_REG_RA << 7) | RISCV_OP_JALR);
        #endif
        }

    • This is super cool!

      Creating an assembler with Lisp syntax and then using that to bootstrap a Lisp compiler (with Lisp macros instead of standard assembler macros) is one of those otherwise pointless educational projects I’ve been wanting to do for years. One day perhaps.

      1 reply →