Comment by jerf

4 years ago

"Path is not a string - it's a sequence of path components and should be treated as such by our APIs."

For maximum correctness, you want to turn it into a file handle as soon as possible, and do all operations through the variations of the file functions that end in "at", like: https://linux.die.net/man/2/openat

The downside of this approach is that you still technically have to carry the path around with you if you ever want to present it back to the user, because once you have a directory handle, you can get back to the root directory easily enough by following parent links and seeing what directories you end up in, but that may not be what the user "thinks" the path is, and they want to see their path, not a canonicalized one. And they're mostly right. And it's not easy to correctly track changes to their intended path from this basis either.

Basically, I don't know of a really solid, 100% correct way to handle this with any reasonable degree of effort.

12 comments

jerf

jmull 4 years ago

> For maximum correctness, you want to turn it into a file handle as soon as possible

That's not right. You want to resolve a file/folder path to a file/folder at the exact point it makes sense.

It's a problem if you're using a path when you wanted the file. The file can be switched/modified out from underneath you.

It's also a problem if you've got the file when you only wanted a reference. Now you can't simply switch/modify the file independent of the reference. E.g., maybe you want config file changes to take effect immediately and transparently.

You can also have the hybrid case, e.g., where you want the folder directly, but have a relative path to a file that is resolved late.

If you're unsure, I'd err on the side of late resolution.

Pxtl 4 years ago

"you want to turn it into a file handle as soon as possible"

But no sooner.

For example, I've run into problems where I'm configuring program A server to talk to file location B... but I don't have access to file location B. But the client-side library for talking to the server tries to convert location B into a file handle and then freaks out because I can't access it. When I don't want to access it. I want that program to serve it.

If it was using simple "path" objects that didn't confirm that I have access to the path, everything would be hunky dory. But because it tried to convert it into a file handle unnecessarily, I get blocked.

tmerr 4 years ago

Another inconvenience with this approach is that you can keep thousands of paths in memory no problem. But thousands of FDs may cause you to exceed per-process limits.

globular-toast 4 years ago

This goes for most instances of user input. Timestamps is the other common one people get wrong. I've even seen programs that pass around timestamps as strings in multiple formats and as integers (Unix time).

aqfamnzc 4 years ago
As a programming noob, I'm wondering what would be the better way to pass or return a unix time value as opposed to an integer?
- mleonhard 4 years ago
  
  If you need to keep the timezone with it, then use an ISO8601 [0] string: "2021-11-11T15:32:35-07:00".
  Otherwise, use an integer unix timestamp, the number of seconds since 1970-01-01T00:00:00Z: 1636673555. Use an unsigned 32-bit integer or a 64-bit integer to avoid the 2038 problem [1]. JSON's maximum safe integer value is a signed 53-bit integer, so if you're using HTTP JSON RPC, you'll have to check for overflow.
  [0] https://en.wikipedia.org/wiki/ISO_8601
  [1] https://en.wikipedia.org/wiki/Year_2038_problem
  [2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
  
  1 reply →
- globular-toast 4 years ago
  
  Depends on the language but most high-level languages have a timestamp or datetime abstraction which you should be using.
- joe_guy 4 years ago
  
  If it's being serialized, consider fully qualified iso8601.

BoorishBears 4 years ago

> For maximum correctness, you want to turn it into a file handle as soon as possible

This is why I get stressed out when I see paths turned into special objects encoding separators and such.

It tells me the path is living for way too long compared to the file handle.

I only want to see path-specific objects if we're modifying the path, and even then I want that to happen as late as possible.

aspaceman 4 years ago

Why not just hold onto both? The users representation and the file handle. Only ever "display" the representation, while you do all operations on the handle. (Not trying to be sarcastic, just curious).

cerved 4 years ago

doesn't this lock the file?