Comment by jerf

4 years ago

"Path is not a string - it's a sequence of path components and should be treated as such by our APIs."

For maximum correctness, you want to turn it into a file handle as soon as possible, and do all operations through the variations of the file functions that end in "at", like: https://linux.die.net/man/2/openat

The downside of this approach is that you still technically have to carry the path around with you if you ever want to present it back to the user, because once you have a directory handle, you can get back to the root directory easily enough by following parent links and seeing what directories you end up in, but that may not be what the user "thinks" the path is, and they want to see their path, not a canonicalized one. And they're mostly right. And it's not easy to correctly track changes to their intended path from this basis either.

Basically, I don't know of a really solid, 100% correct way to handle this with any reasonable degree of effort.

> For maximum correctness, you want to turn it into a file handle as soon as possible

That's not right. You want to resolve a file/folder path to a file/folder at the exact point it makes sense.

It's a problem if you're using a path when you wanted the file. The file can be switched/modified out from underneath you.

It's also a problem if you've got the file when you only wanted a reference. Now you can't simply switch/modify the file independent of the reference. E.g., maybe you want config file changes to take effect immediately and transparently.

You can also have the hybrid case, e.g., where you want the folder directly, but have a relative path to a file that is resolved late.

If you're unsure, I'd err on the side of late resolution.

"you want to turn it into a file handle as soon as possible"

But no sooner.

For example, I've run into problems where I'm configuring program A server to talk to file location B... but I don't have access to file location B. But the client-side library for talking to the server tries to convert location B into a file handle and then freaks out because I can't access it. When I don't want to access it. I want that program to serve it.

If it was using simple "path" objects that didn't confirm that I have access to the path, everything would be hunky dory. But because it tried to convert it into a file handle unnecessarily, I get blocked.

Another inconvenience with this approach is that you can keep thousands of paths in memory no problem. But thousands of FDs may cause you to exceed per-process limits.

This goes for most instances of user input. Timestamps is the other common one people get wrong. I've even seen programs that pass around timestamps as strings in multiple formats and as integers (Unix time).

> For maximum correctness, you want to turn it into a file handle as soon as possible

This is why I get stressed out when I see paths turned into special objects encoding separators and such.

It tells me the path is living for way too long compared to the file handle.

I only want to see path-specific objects if we're modifying the path, and even then I want that to happen as late as possible.

Why not just hold onto both? The users representation and the file handle. Only ever "display" the representation, while you do all operations on the handle. (Not trying to be sarcastic, just curious).