← Back to context

Comment by HappMacDonald

6 days ago

Well the question then becomes "how do you identify the quoting that needs to happen on the line" and tactics common in Rust enabled by features available in Rust will still lead a person away from this pattern of error.

One tool I'd have probably reached for (long before having heard of this particular corner case to avoid) would have been whitespace trimming, and CR counts as whitespace. Plus folk outside of C are also more likely to aim a regex at a line they want to parse, and anyone who's been writing regex for more than 5 minutes gets into the habit of adding `\s*` adjacent to beginning of line and end of line markers (and outside of capture groups) which in this case achieves the same end.

You're describing a different format entirely then if you're doing generic whitespace trimming without any consideration for the definition of "whitespace". The Git config format explicitly defines ignorable whitespace as spaces and horizontal tabs, and says that these whitespace characters are trimmed from values, which means nothing else gets trimmed from values. If you try to write a parser for this using a regular expression and `\s*` then you'd better look up what `\s` means to your regex engine because it almost certainly includes more than just SP and HT.

I can't think of any features in Rust that will lead someone away from this pattern of error, where this pattern of error is not realizing that round-tripping the serialized output back through the deserializer can change the boundaries of line endings. It's really easy to think "if I have a bunch of single-line strings and I join them with newlines I now have multiline text, and I can split that back up into individual lines and get back what I started with". This is doubly true if you start with a parser that splits on newline characters and then change it after the fact to use BufRead::lines() in response to someone telling you it doesn't work on Windows.

I've been writing regular expressions for at least 8 years, and I'm not sure I've ever written `\s*`.