Comment by anyfoo

4 years ago

Strings following certain rules are entirely valid representations of paths, just like sequences of path components in the chosen language/framework are. Similarly, the sequences of bits that make up the sequences of your language/framework in memory are an entirely valid representation of said sequences of components.

Yes, paths have structure, but saying "a path is not a string" is equivalent of saying "C source code is not a string". Both are strings, and both are something else, represented by strings according to rules. Different internal representations have different advantages and disadvantages. I fully agree that for things such as "adding components" an internal sequence/list representation is better, but strings can pass arbitrary IPC or even ABI boundaries much easier for example. (And you wouldn't bat an eye for example when you see FQDNs like "www.google.com" passed as a string instead of as ["www","google","com"] because the string representation works pretty well.)

C source code and paths are both representable by strings, true, but the fact that they're not actually strings is still important, because most people don't know that, and in the case of paths that leads to a lot of edge cases (in the case of source code it leads to a bunch of inefficient and weak tooling, which isn't quite as bad).

Because neither are strings, their native representation shouldn't be such - it should be something structured, and only when necessary (IPC, FFI, serdes) be serialized into a string representation. This would save people a lot of time and effort.

  • It really depends. Do you usually keep hostnames as strings? URLs? JPEGs? Why or why not?

    Sure, a browser will hopefully quickly parse that URL and break it up, an image viewer will do the same with a JPEG. Will anything that's only interested opening/displaying that URL or JPEG, through a library or external program?

    POSIX paths are actually remarkably simple in structure[1]. The only caveat is equality and normalization: Without normalization, a path a might be equal to a path b while their representations differ, e.g. "/etc/foo" and "/etc/bar/../foo". But this is the same whether you have a string or a list of strings, you need to normalize in whatever representation you choose to check for equality.

    [1] Almost shocking myself, even Haskell defines its primary FilePath type literally as "String".