Comment by SAI_Peregrinus

1 month ago

Paths are hard because they usually look like printable text, but don't have to be text. POSIX filenames are octet strings not containing 0x2F or 0x00. They aren't required to contain any "printable" characters, or even be valid text in any particular encoding. Most of the Rust stdlib you're thinking of is for handling text strings, but paths aren't text strings. Python also has the same split between Pathlib paths & all other strings.

Yeah, the issue is that there are no utilities for manipulating OsStrings, like for splitting, regex matching, or formatting OsStrings/Paths.

For instance the popular `fd` utility can't actually see files containing malformed utf-8, so you can hide files from system administrators naively using those tools by just adding invalid utf-8.

    touch $'example\xff.txt'
    fd 'example.*txt' // not found
    fd -F $'example\xff.txt' // fails non-utf8

The existing rust libraries for manipulating OsString push people towards ignorance or rejection of non-utf8 filenames and paths.