Comment by nospice
1 day ago
"Zero heap allocations: Parser results are stack-allocated using OxCaml unboxed records and local lists" - honest question, why?
On almost any platform on which you want to run a HTTP server - including bare metal - it usually doesn't matter if you keep state near the stack pointer or not. What matters is that you use it well, making it play well with CPU caches, etc. Or is there something specifically horrible about OxCaml's heap allocator?
In a conventional GCed language, you need to minimise heap allocations to avoid putting too much pressure on the garbage collector. The OxCaml extensions allows values to be passed 'locally' (that is, on the callstack) as an alternative to heap allocation. When the function returns, the values are automatically deallocated (and the type system guarantees safety).
This means that I can pass in a buffer, parse it, do my business logic, and then return, without ever allocating anything into the global heap. However, if I do need to allocate into it (for example, a complex structure), then it's still available.
It's kind of Rust in reverse: OxCaml has a GC by default, but you can write very high-performance code that effectively never uses a GC. There's also emerging support for data-race-free parallelisation as well.
The webserver I'm putting together also uses io_uring, which provides zero-copy buffers from kernel to userspace. The support for one-shot effect handlers in OCaml allows me to directly resume a blocked fiber straight from the io_uring loop, and then this httpz parser operates directly on that buffer. Shared memory all the way with almost no syscalls!
I think there are several advantages of stack allocation:
* freeing stack allocated memory is O(1) with a small constant factor: simply set the stack pointer to a new location. In a generational garbage collector, like OCaml, minor garbage collection is O(amount of retained memory) with a larger constant factor.
* judiciously stack allocating memory can improve data locality.
* unboxed data takes up less space, again improving locality.
Overall, I think this about improving constant factors---which makes a big difference in practice!
Unboxed records are fine, but stack-allocated lists make me nervous. What happens when someone gives you 8 megs of headers, and you run out of stack?
This code seems to put a 32k limit on it, but it's a manual check and error return. What about code that forgets to manually add that limit, or sets it too high? How do you decide when to bump that limit, since 32k is an artificial constraint?
By default in oxcaml, "stack" / local allocations happen in a separate stack on the heap (which the runtime allocates for you). If you allocate enough to exceed that capacity, it will resize it dynamically for you.
So stack-local arena. Neat.