← Back to context

Comment by solatic

14 hours ago

Headline is wrong. I/O wasn't the bottleneck, syscalls were the bottleneck.

Stupid question: why can't we get a syscall to load an entire directory into an array of file descriptors (minus an array of paths to ignore), instead of calling open() on every individual file in that directory? Seems like the simplest solution, no?

It's not the syscalls. There were only 300,000 syscalls made. Entering and exiting the kernel takes 150 cycles on my (rather beefy) Ryzen machine, or about 50ns per call.

Even assume it takes 1us per mode switch, which would be insane, you'd be looking at 0.3s out of the 17s for syscall overhead.

It's not obvious to me where the overhead is, but random seeks are still expensive, even on SSDs.

  • Didn't test, but my guess is it's not “syscalls” but “open,” “stat,” etc; “read” would be fine. And something like “openat” might mitigate it.

io_uring supports submitting openat requests, which sounds like what you want. Open the dirfd, extract all the names via readdir and then submit openat SQEs all at once. Admittedly I have not used the io uring api myself so I can't speak to edge cases in doing so, but it's "on the happy path" as it were.

https://man7.org/linux/man-pages/man3/io_uring_prep_open.3.h...

https://man7.org/linux/man-pages/man2/readdir.2.html

Note that the prep open man page is a (3) page. You could of course construct the SQEs yourself.

  • You have a limit of 1k simultaneous open files per process - not sure what overhead exists in the kernel that made them impose this, but I guess it exists for a reason. You might run into trouble if you open too many files at ones (either the kernel kills your process, or you run into some internal kernel bottleneck that makes the whole endeavor not so worthwhile)

Not sure, I'd like that too

You could use io_uring but IMO that API is annoying and I remember hitting limitations. One thing you could do with io_uring is using openat (the op not the syscall) with the dir fd (which you get from the syscall) so you can asynchronously open and read files, however, you couldn't open directories for some reason. There's a chance I may be remembering wrong

>why can't we get a syscall to load an entire directory into an array of file descriptors (minus an array of paths to ignore), instead of calling open() on every individual file in that directory?

You mean like a range of file descriptors you could use if you want to save files in that directory?

What comes closest is scandir [1], which gives you an iterator of direntries, and can be used to avoid lstat syscalls for each file.

Otherwise you can open a dir and pass its fd to openat together with a relative path to a file, to reduce the kernel overhead of resolving absolute paths for each file.

[1] https://man7.org/linux/man-pages/man3/scandir.3.html