← Back to context

Comment by layer8

7 days ago

You can get similar vulnerabilities with Unicode normalization, with mismatched code pages/character encodings, or, as the article points out, with a case-insensitive file system. That's not to say that control characters should be allowed in file names, but there's an inherent risk whenever byte sequences are being decoded or normalized into something else.

Not to the same degree, though, and the arguments for status quo are especially weak. There are reasonable arguments pro and con case-insensitive filenames. Character encoding issues are dwindling, since most systems just use utf-8 for filename encoding (as there is no mechanism for indicating the encoding of each specific filename), and using utf-8 consistently in filenames supports filenames in arbitrary languages.

Control characters in filenames have no obviously valuable use case, they appear to be allowed only because "it's always been allowed". That is not a strong argument for them. Some systems do not allow them, with no obvious ill effects.