Comment by jeroenhd
14 days ago
You have to choose something, and I'm glad they didn't go with the idiotic Go approach ("every path is a valid UTF-8 string" or we just garble the path at the standard library level"). You can usually abstract away platform weirdness at the implementation level, but programming on non-Unix environments it's more like programming against cygwin.
A standard library for files and paths that lacks things like ACLs and locks is weirdly Unixy for a supposedly modern language. Most systems support ACLs now, though Windows uses them a lot more. On the other hand, the lack of file descriptors/handles is weird from all points of view.
Had Windows been an uncommon target, I would've understood this design, but Windows is still the most common PC operating system in the world by a great margin. Not even considering things like "multile filesystem roots" (drive letters) "that happen to not exist on Linux", or "case insensitive paths (Windows/macOS/some Linux systems)" is a mistake for a supposedly generic language, in my opinion.
As far as I can tell from Microsoft's documentation, WinAPI access for ACLs was added in Windows 10, which Rust 1.0 predates. And std::fs attempts to provide both minimalist and cross-platform APIs, which in practice means (for better or worse) it's the lowest common denominator between Windows and Unix, with the objective being that higher-level libraries can leverage it as a building block. From the documentation for std::fs:
"This module contains basic methods to manipulate the contents of the local filesystem. All methods in this module represent cross-platform filesystem operations. Extra platform-specific functionality can be found in the extension traits of std::os::$platform."
Following its recommendation, if we look at std::os::windows::fs we see an extension trait for setting Windows-specific flags for WinAPI-specific flags, like dwDesiredAccess, dwShareMode, dwFlagsAndAttributes. I'm not a Windows dev but AFAICT we want an API to set lpSecurityAttributes. I don't see an option for that in std::os::windows::fs, likely complicated by the fact that it's a pointer, so acquiring a valid value for that parameter is more involved than just constructing a bitfield like for the aforementioned parameters. But if you think this should be simple, then please propose adding it to std::os::windows::fs; the Rust stdlib adds new APIs all the time in response to demand. (In the meantime, comprehensive Windows support is generally provided by the de-facto standard winapi crate, which provides access to the raw syscall).
> WinAPI access for ACLs was added in Windows 10
I'm not sure which docs you mean but that's not true. The NT kernel has used ACLs long before rust was invented. But it's indeed true that rust adds platform-specific methods based on demand. The trouble with ACLs is it means either creating a large API surface in the standard library to handle them or else presenting a simple interface but having to manage raw pointers (likely using a wrapper type but even then it can't be made totally safe).
> the de-facto standard winapi crate, which provides access to the raw syscall
Since the official Microsoft `windows-sys` crate was released many years ago, the winapi crate has been effectively unmaintained (it accepts security patches but that's it).
> I'm not sure which docs you mean
I was looking at these: https://learn.microsoft.com/en-us/windows/security/identity-...
> the winapi crate has been effectively unmaintained
Shows how much of a Windows dev I am. :P
1 reply →
As far as I can tell even NFS got ACL support before the first Rust release. NFSv4.1 in 2010 vs Rust in 2012.
SetFileSecurityA is listed as Windows XP+ (https://learn.microsoft.com/en-us/windows/win32/api/winbase/...) but Microsoft has deprecated all pre-XP documentation.
According to https://www.geoffchappell.com/studies/windows/win32/advapi32..., the function was available first in advapi32 version 3.10, which was included in Windows NT 3.10 (14th July 1993): https://www.geoffchappell.com/studies/windows/win32/advapi32...
lpSecurityAttributes just refers to a SecurityAttributes struct (Rust bindings here: https://microsoft.github.io/windows-docs-rs/doc/windows/Win3...) Annoying pointers for sure, but nothing a Rust API can't work around with standard language features.
And sure, Rust could add the entire windows crate to the standard library, but my point is that this isn't just Windows functionality: getfacl/setfacl has been with us for decades but I don't know any standard library that tries to include any kind of ACLs.
You misunderstand the documentation. Microsoft doesn't provide online documentation for versions of Windows that are no longer supported. Functions like SetFileSecurity have existed since Windows NT 3.1 back in 1993.
But the documentation I'm using claims that it applies to Windows 10, which stopped being supported last year.
1 reply →
> I'm glad they didn't go with the idiotic Go approach ("every path is a valid UTF-8 string" or we just garble the path at the standard library level")
Can you expound a bit on this? I haven't been able to find any articles related to this kind of problem. It's also a bit surprising, given that Go specifically did not make the same choice as Rust to make strings be Unicode / UTF-8 (Go strings are just arrays of bytes, with one minor exception related to iteration using the range syntax).
Go's docs put it like this: Path names are UTF-8-encoded, unrooted, slash-separated sequences of path elements, like “x/y/z”. If you operate on a path that's a non-UTF-8 string, then Go will do... something to make the string work with UTF-8 when passed back to standard file methods, but it likely won't end up operating on the same file.
Rust has OsStr to represent strings like paths, with a lossy/fallible conversion step instead.
Go's approach is fine for 99% of cases, and you're pretty screwed if your application falls for the 1% issue. Go has a lot of those decisions, often to simplify the standard library for most use cases most people usually run into (like their awful, lossy, incomplete conversion between Unix and Windows when it comes to permissions/read-only flags/etc.).
> Path names are UTF-8-encoded, unrooted, slash-separated sequences of path elements, like “x/y/z”
This is only for the "io/fs" package and its generic filesystem abstractions. The "os" package, which always operates on the real filesystem, doesn't actually specify how paths are encoded, nor does its associated helper package "path/filepath".
In practice, non-UTF-8 already wasn't an issue on Unix-like systems, where file paths are natively just byte sequences. You do need to be aware of this possibility to avoid mangling the paths yourself, though. The real problem was Windows, where paths are actually WTF-16, i.e. UTF-16 with unpaired surrogates. Go has addressed this issue by accepting WTF-8 paths since Go 1.21: https://github.com/golang/go/issues/32334#issuecomment-15500...
The `os` package, that is the main way everyone I've seen opens and reads files in Go, doesn't specify any restriction on its path syntax (except that it uses `string`, of course). I've tried using it on Linux with a file name that would be invalid UTF-8 and it works without any issues.
I for one hadn't even heard of the io/fs package that has the problems that you mention, and I don't remember ever seeing it used in an example. I've looked in a code base I help maintain, and the only uses I could find are related to some function type definitions that are used by filepath.WalkDir and filepath.Walk - and those functions explicitly document the fact that they don't use `io/fs` style paths when calling these functions - they don't even respect the path separator format:
Where fs.WalkDirFunc is defined like this:
> Go strings are just arrays of bytes,
https://go.dev/ref/spec#String_types: “A string value is a (possibly empty) sequence of bytes”
https://pkg.go.dev/strings@go1.26.2: “Package strings implements simple functions to manipulate UTF-8 encoded strings.”
So, yes, Go strings are just arrays of bytes in the language, but in the standard library, they’re supposed to be UTF-8 (the documentation isn’t immediately clear on how it handles non-UTF-8 strings).
I think this may be why the OP thinks the Go approach is “every path is a valid UTF-8 string”