Gathering Linux Syscall Numbers in a C Table

5 days ago (t-cadet.github.io)

> I was expecting a unified interface across all architectures, with perhaps one or two architecture-specific syscalls to access architecture-specific capabilities; but Linux syscalls are more like Swiss cheese.

There's lots of historical weirdness, mostly around stuff where the kernel went "oops, we need 64-bit time_t or off_t or whatever" and added, for example, getdents64 to old platforms, but new platforms never got the broken 32-bit version. There are some more interesting cases, though, like how until fairly recently (i.e. about a decade ago for the mainline kernel), on x86 (and maybe other platforms?) there weren't individual syscalls for each socket syscall, they were all multiplexed through socketcall.

> In an ideal world, there would be a header-only C library provided by the Linux kernel; we would include that file and be done with it. As it turns out, there is no such file, and interfacing with syscalls is complicated.

Because Linux is the exception, UNIX public API is the C library as defined later by POSIX.

The goal to create C and rewrite UNIX V4 into C was exactly to move away from this kind of platform details.

Also UNIX can be seen as C's runtime, in a way, thus traditionally the C compiler was the same of the platform vendor, there were not pick and chose among C compilers and standard libraries, that was left for non-UNIX platforms.

  • All true, but note that BSD introduced, and both Linux/glibc and Linux/musl support, a syscall(2) wrapper routine that takes a syscall number, a list of arguments (usually as long's), and performs the syscall magic. The syscall numbers are defined as macros beginning with SYS_. The Linux kernel headers export syscall numbers with macros using the prefix __NR_, but to match the BSD interface Linux libc headers usually translate or otherwise define them using a SYS_ prefix. Using the macros is much better because the numbers often vary by architecture for the same syscall.

    See https://man7.org/linux/man-pages/man2/syscall.2.html

    • Except with BSDs you are on your own if you go down that route, because there are no stability guarantees.

      It is more of an implementation detail for the rest of the C APIs than anything else.

      1 reply →

  • Even if one would want to use Linux only through libc, that is not always possible.

    Linux has evolved beyond POSIX and many newer syscalls, which can enhance performance in certain scenarios, are not available as libc functions.

    They may be invoked either using the generic syscall wrappers provided by glibc besides the standard functions, or by using custom wrappers or possibly by using some special libraries, if such libraries are available.

    • That isn't a valid reason, given the existence of Solaris, HP-UX, DG/UX, Tru64, NeXTSTEP, and so many other UNIXes that grew beyond AT&T UNIX System V.

      All of them provide C APIs to their additional features not covered by POSIX.

      What Linux has is that due to the way syscalls are exposed there is a certain laziness to cover everything on glibc, or its replacements like musl.

> In an ideal world, there would be a header-only C library provided by the Linux kernel; we would include that file and be done with it. As it turns out, there is no such file, and interfacing with syscalls is complicated.

Isn't that nolibc.h?

  • Nolibc, which is incorporated in the Linux kernel sources (under "tools"), contains generic wrappers for the Linux syscalls and a set of simplified implementations for the most frequently needed libc functions.

    It is useful for very small executables or for some embedded applications.

    It is not useful for someone who would want to use the Linux syscalls directly from another programming language than C, bypassing glibc or other libc implementations, except by providing models of generic wrappers for the Linux syscalls.

    It also does not satisfy the requirement of the parent article, because it does not contain a table of the syscalls.

    Nolibc implements its functions like this:

      long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
      {
       return my_syscall3(__NR_ioctl, fd, cmd, arg);
      }
    

    where the syscall numbers like "__NR_ioctl" are collected from the Linux kernel headers, because nolibc is compiled in the kernel source tree.

I've been thinking about doing this for a little side project for some time. Looking forward to the eventual conclusion :)