← Back to context

Comment by p_l

4 days ago

> Could someone show a legit reason to use 1000-character filenames?

1023 byte names can mean less than 250 characters due to use of unicode and utf-8. Add to it unicode normalization which might "expand" some characters into two or more combining characters, deliberate use of combining characters, emoji, rare characters, and you might end up with many "characters" taking more than 4 bytes. A single "country flag" character will be usually 8 bytes, usually most emoji will be at least 4 bytes, skin tone modifiers will add 4 bytes, etc.

this ' ' takes 27 bytes in my terminal, '󠁧󠁢󠁳󠁣󠁴󠁿' takes 28, another combo I found is 35 bytes.

And that's on top of just getting a long title using let's say one of CJK or other less common scripts - an early manuscript of somewhat successful Japanese novel has a non-normalized filename of 119 byte, and it's nowhere close to actually long titles, something that someone might reasonably have on disk. A random find on the internet easily points to a book title that takes over 300 bytes in non-normalized utf8.

P.S. proper title of "Robinson Crusoe" if used as filename takes at least 395 bytes...

hah. Apparently HN eradicated the carefully pasted complex unicode emojis.

The first was "man+woman kissing" with skin tone modifier, then there was few flags