(author here) I'm just adding data-race free parallelism support to this right now to switch my website over to using it! For those familiar with OCaml syntax, the OxCaml parse function is fun:
val parse : buffer -> len:int -> #(status * request * header list) @ local
This takes in a buffer and returns an unboxed tuple on the stack, so there's no GC activity involved beyond stack management for each HTTP request.
Doesn't (honest question) the operating system kernel prevent data races in memory accesses at the level of system calls like brk? I wonder at what level the operating system handles such things?
"Zero heap allocations: Parser results are stack-allocated using OxCaml unboxed records and local lists" - honest question, why?
On almost any platform on which you want to run a HTTP server - including bare metal - it usually doesn't matter if you keep state near the stack pointer or not. What matters is that you use it well, making it play well with CPU caches, etc. Or is there something specifically horrible about OxCaml's heap allocator?
In a conventional GCed language, you need to minimise heap allocations to avoid putting too much pressure on the garbage collector. The OxCaml extensions allows values to be passed 'locally' (that is, on the callstack) as an alternative to heap allocation. When the function returns, the values are automatically deallocated (and the type system guarantees safety).
This means that I can pass in a buffer, parse it, do my business logic, and then return, without ever allocating anything into the global heap. However, if I do need to allocate into it (for example, a complex structure), then it's still available.
It's kind of Rust in reverse: OxCaml has a GC by default, but you can write very high-performance code that effectively never uses a GC. There's also emerging support for data-race-free parallelisation as well.
The webserver I'm putting together also uses io_uring, which provides zero-copy buffers from kernel to userspace. The support for one-shot effect handlers in OCaml allows me to directly resume a blocked fiber straight from the io_uring loop, and then this httpz parser operates directly on that buffer. Shared memory all the way with almost no syscalls!
I think there are several advantages of stack allocation:
* freeing stack allocated memory is O(1) with a small constant factor: simply set the stack pointer to a new location. In a generational garbage collector, like OCaml, minor garbage collection is O(amount of retained memory) with a larger constant factor.
* judiciously stack allocating memory can improve data locality.
* unboxed data takes up less space, again improving locality.
Overall, I think this about improving constant factors---which makes a big difference in practice!
Unboxed records are fine, but stack-allocated lists make me nervous. What happens when someone gives you 8 megs of headers, and you run out of stack?
This code seems to put a 32k limit on it, but it's a manual check and error return. What about code that forgets to manually add that limit, or sets it too high? How do you decide when to bump that limit, since 32k is an artificial constraint?
By default in oxcaml, "stack" / local allocations happen in a separate stack on the heap (which the runtime allocates for you). If you allocate enough to exceed that capacity, it will resize it dynamically for you.
The OxCaml work is great. I don't use OCaml much but I have been following along with OxCaml as they are doing fascinating work that leverages a lot of research that interests me.
> Local lists (@ local): Header list grows on the stack, not heap
Does this mean unbounded stack growth? I'd much rather heap allocation if that's the case, as at least that can be recovered from in the case of allocation failure (assuming your OS, language, and stdlib allow for it).
This particular iteration is unbounded, but the next step is to pass in a GADT argument to specify which headers the application wants, so only those are parsed into a heterogenous tuple.
If you were looking at the parse link in the author's comment, you were looking at a spec (called a module interface in OCaml/OxCaml, similar to an interface in Java). The parse implementation is at https://github.com/avsm/httpz/blob/240051dd5f00281b09984a14a...
That said, I would be happy if all I needed to type in was a spec.
(author here) I'm just adding data-race free parallelism support to this right now to switch my website over to using it! For those familiar with OCaml syntax, the OxCaml parse function is fun:
This takes in a buffer and returns an unboxed tuple on the stack, so there's no GC activity involved beyond stack management for each HTTP request.
https://github.com/avsm/httpz/blob/main/lib/httpz.mli#L154
Doesn't (honest question) the operating system kernel prevent data races in memory accesses at the level of system calls like brk? I wonder at what level the operating system handles such things?
I mean, aren't system calls thread-safe?
1 reply →
Interesting parser, fun to read.
Oh I got the joke! (I'm pretty sure it was intended)
Yes a parser is a fun to read ;)
"Zero heap allocations: Parser results are stack-allocated using OxCaml unboxed records and local lists" - honest question, why?
On almost any platform on which you want to run a HTTP server - including bare metal - it usually doesn't matter if you keep state near the stack pointer or not. What matters is that you use it well, making it play well with CPU caches, etc. Or is there something specifically horrible about OxCaml's heap allocator?
In a conventional GCed language, you need to minimise heap allocations to avoid putting too much pressure on the garbage collector. The OxCaml extensions allows values to be passed 'locally' (that is, on the callstack) as an alternative to heap allocation. When the function returns, the values are automatically deallocated (and the type system guarantees safety).
This means that I can pass in a buffer, parse it, do my business logic, and then return, without ever allocating anything into the global heap. However, if I do need to allocate into it (for example, a complex structure), then it's still available.
It's kind of Rust in reverse: OxCaml has a GC by default, but you can write very high-performance code that effectively never uses a GC. There's also emerging support for data-race-free parallelisation as well.
The webserver I'm putting together also uses io_uring, which provides zero-copy buffers from kernel to userspace. The support for one-shot effect handlers in OCaml allows me to directly resume a blocked fiber straight from the io_uring loop, and then this httpz parser operates directly on that buffer. Shared memory all the way with almost no syscalls!
I think there are several advantages of stack allocation:
* freeing stack allocated memory is O(1) with a small constant factor: simply set the stack pointer to a new location. In a generational garbage collector, like OCaml, minor garbage collection is O(amount of retained memory) with a larger constant factor.
* judiciously stack allocating memory can improve data locality.
* unboxed data takes up less space, again improving locality.
Overall, I think this about improving constant factors---which makes a big difference in practice!
Unboxed records are fine, but stack-allocated lists make me nervous. What happens when someone gives you 8 megs of headers, and you run out of stack?
This code seems to put a 32k limit on it, but it's a manual check and error return. What about code that forgets to manually add that limit, or sets it too high? How do you decide when to bump that limit, since 32k is an artificial constraint?
By default in oxcaml, "stack" / local allocations happen in a separate stack on the heap (which the runtime allocates for you). If you allocate enough to exceed that capacity, it will resize it dynamically for you.
1 reply →
The OxCaml work is great. I don't use OCaml much but I have been following along with OxCaml as they are doing fascinating work that leverages a lot of research that interests me.
> Local lists (@ local): Header list grows on the stack, not heap
Does this mean unbounded stack growth? I'd much rather heap allocation if that's the case, as at least that can be recovered from in the case of allocation failure (assuming your OS, language, and stdlib allow for it).
This particular iteration is unbounded, but the next step is to pass in a GADT argument to specify which headers the application wants, so only those are parsed into a heterogenous tuple.
That sounds like a rather elegant solution to it.
ocaml looks more like a spec than actual code.
If you were looking at the parse link in the author's comment, you were looking at a spec (called a module interface in OCaml/OxCaml, similar to an interface in Java). The parse implementation is at https://github.com/avsm/httpz/blob/240051dd5f00281b09984a14a...
That said, I would be happy if all I needed to type in was a spec.
I'm excited to see what comes of OxCaml the next few years.