Comment by eadmund
3 months ago
Yeah, first-wins is definitely surprising. Off the top of my head, it feels like one would have to go out of one’s way to write a parser that does that (by storing an extra bit of state for each configuration item, and then checking it before setting the configuration item and toggling the state, rather than just applying the configuration item each time it is encountered).
Is there a good reason for this design? I can’t think of one, again off the top of my head, but of course I could be missing something.
It probably makes a bit more sense when you think about the fact that SSH frequently does a "try to match this host against a list of configured host-patterns" operation. In that case, "first match" is the obvious thing to do.
The SSHD internally has a config structure. Initially, it's all initialized with -1 for flags/numbers/timeouts, and with NULL for strings. When an option is parsed, there is a check whether the config structure's corresponding field is -1/NULL (depending on the type) before the parsed value is saved.
Another program with first-wins I've seen used dict/map for its config so the check was even simpler: "if optname not in config: config[optname] = parsed_value".
iptables uses first-wins.
https://serverfault.com/questions/367085/iptables-first-matc...
It's actually the simplest scheme. Reparse from the top whenever you need to query a setting. When you see one, exit. No need to even bother to store an intermediate representation. No idea if this matches the actual ssh implementation, but that's the way many historical parsers worked. The idea of cooking your text file on disk (into precious RAM!) is fairly modern.
Nope, the actual ssh implementation parses all the config files once, at the startup, using buffered file I/O and getline(). That means that on systems with modern libc, the whole config file (if it's small enough, less than 4 KiB IIRC?) gets read into the RAM and then getline() results are served from that buffer.
The scheme you propose is insane and if it was ever used (can you actually back that up? The disk I/O would kill your performance for anything remotely loaded), it was rightfully abandoned for much faster and simpler scheme.
> getline() results are served from that buffer.
So... it doesn't parse them once! It just does its own[1] buffering layer and implements... exactly the algorithm I described? Not seeing where you're getting the "Nope" here, except to be focusing on the one historical note about RAM that I put in parentheses.
[1] Somewhat needless given the OS has already done this. It only saves the syscall overhead.
1 reply →
Loading up your parsing code and reopening the file every time a setting is queried sounds to me like it would increase the average memory use of most programs.
The ssh config format has almost no context, and the code is static and always "loaded up". I can all but guarantee this isn't correct. Modern hackers tend to wildly overestimate the complexity of ancient tasks like parsing.
1 reply →
You don't care about average memory use, you care about peak memory use.
12 replies →