Comment by TacticalCoder
4 years ago
Define "space". Is the Hangul filler we talked about yesterday a spacing character? Is the zero-width non-breaking space a spacing character? What about the typographic spacing characters?
You should better be very afraid of using spaces in filenames.
You should do everything you can to support them but you have to know you'll invariably encounter countless cases where you'll have this or that tool that won't work properly with them.
I still live in a world where I cannot name a song from the french group L'impératrice with an eacute in the filename or my car's media system will display garbage (it's running QNX and I don't know which filesystem).
FWIW, and it should be food for thought, every single Git repository in the world contains a pre-commit hook sample (disabled by default but it's there) that enforces that every committed file in the repo is named using a subset of ASCII characters.
Every Git repository in the world has that example: let that sink in.
> FWIW, and it should be food for thought, every single Git repository in the world contains a pre-commit hook sample (disabled by default but it's there) that enforces that every committed file in the repo is named using a subset of ASCII characters.
I use Git for documents too, not only code. Why shouldn't I use my native language?
Tab completion don’t work well for languages that require IME. That is one reason why I don’t.
That's actually a good point. On the other hand, not all languages use IMEs. Mine just uses the AltGr modifier key, but is otherwise just a standard QWERTY layout without any features.
IME == Input method editor?
https://en.wikipedia.org/wiki/Input_method
1 reply →
Tab completion works just fine for me with a Japanese IME.
non-ascii characters cause annoying hard to fix problems. If you're willing to deal with that - kudos. Personally I don't find it worthwhile
I haven't had problems yet. Spaces, punctuation, and quotes are the main offenders, most of the time.
> I still live in a world where I cannot name a song from the french group L'impératrice with an eacute in the filename or my car's media system will display garbage (it's running QNX and I don't know which filesystem).
I have an Android phone and I tell MusicBrainz Picard to save all files with ASCII-only names and Windows-compatible names for the ones that get sent over to the phone. Basically for this reason. Sometimes it's players on Android itself, but even more frequently, whatever bluetooth radio I'm connected to freaking out with non-ASCII characters.
What do you mean, display garbage?
L'imp?ratrice? L'imp�ratrice? L'impératrice? L'imp‚ratrice? L'impÚratrice?
You get all those space characters working and then some jerk comes along and uploads a file like this: ŗ̶̧̢͓̳͍͙͔̳̻̥͉̭͓̫̟͍̞̭͉͓͉̮̹͍͚̳̹̬͉͚̰͈̘̐̊̾̈̀̒͒̀͛̓̋̔͊̏͘̚ę̴̨̛̣͙̤̟̬̩̟͙͖̥̹̱̱̊͑͗̇̇͛̆̈́̃͋̓̀̔̍̍̌̐͊̎̓̅̀̕ͅģ̴̹̜̘͍̱̑͐̉̌̐̄̊͛̎́̐̌̅̈́͂͑̈́̋̔͂̊̊̒̒̔͛͆̚͘̕͠e̶̙͕̫̳̘͐̾́̑͆̓͂̿͊̊̍͛͐̌̆͗̌̅̅̔͊̂͛͗̅̕͝͝͝͝x̵̢̧̦̫͖̝̥̹͓̬͖̤̩͚̝̫̋̃̅̈́̆͋̌͑́̎̈́̊̾͒̀̒̎̓͛͊̿̓͊̀̍͐̆̚͝͝-̴̨̮̯͖͖̠̜̲̪͕̘͈͖̮̈́̓̐̃́̅̄̏́̍̉̐͌́̔̓̄͋͗̐̕͜͝ţ̴̢̧̖̗͖̞̮̫̦̼̝̺̼̱̳͓͉̜̟̤̲͖̻͙́̌̈̌̈͆̾̄͊̿̏̓͗̈́̕͜ͅh̶̢̧̨̥̭̼̟̣͖̯̗̤̖̙͉͕̙͎̰̠̝̖͈̻͙̪̮̘̯̻̼͕͓̖̣͈̽́͊̎͐͌̆̍̎̏̿͐̒́͋͑̍̿̎͆̑͆̄͂̀͐̄͑̀͗̿̽̎̾̊̕͝͝͝͝͝ͅi̴͚͈͍̫̮̝̣͖͉͓̯̠̙̭̟̖̘̾̓̄̈́̒̏̽̆̉̿͛̀́̃̋̒̈́͋̂̇̈́͛̕͜͠͠͝ͅs̶͇̖̳̞͉̱̞͓̖͔͔͍̗͇̖̮̹̅͊̔͋͊̈́̎̐̆̋̒̀̍̕͜ͅ.̴̧͎͇̰͉̼̱̰̦̟̑̋̏͌̍͊͑̄̀͌́̆̓͛̒̆̾̉͐̄̂̈́͆̒̃͗̐̂̎̈́̈͛̿́͛̾̚͘͜͝͝ͅȩ̷̡̲̪̱̪̥̳͍̼̰̘̗̹͙͙͓̣̟̩̥̥̖̠̪̮̹̞̥̻͎͖͍̯̂͑̏̑̆̍͋̎͛̅̑̑̏̎̓̀̓̒̈́͊͌̀̈́̒̌͐͂͛̊̍̐͂́̔̌̾͐̈́̋̇̏̚͜͝͝͝͠ͅx̶̧̛͚̗̜̪͍͖̘̙͎͚͇͙̬̱̟̭͓̺̙͍̖̱͚̣̘̪̭͔͔̮͎̬̪̤̹̟͔̩͍̬͕͔̩͐̈́̒̂͛̂̈̀̿̍̔̓̓̀̃̍͆̈́̍̓̌͐̈́̾̇̎̑͌͒̄̆̿̍͆̅͗͆͘͠͝͝ͅͅͅe̷̢̡̡̨̧̛͕͚̬̮̞̥̼͍͔̝̟̝̯͈̟̥͖̱̹̣̩̼̩̅̌͌̑̎̐̀̽̏́͐̋̏̎̎͛͌̀̊͊͒̑͌̎̎̑͊̌̉͆̾̚͘̚͜͠͠͠͝͝ͅͅͅ
For anyone who is curious (and acolytes of Zalgo): "In Unicode, character rendering does not use a simple character cell model where each glyph fits into a box with given height. Combining marks may be rendered above, below, or inside a base character. So you can easily construct a character sequence, consisting of a base character and “combining above” marks, of any length, to reach any desired visual height, assuming that the rendering software conforms to the Unicode rendering model."
[https://stackoverflow.com/questions/6579844/how-does-zalgo-t...]
Hah, lucky for me Chrome on Ubuntu didn't implement the spec correctly. ;)
This legitimately made me laugh out loud in my office.
The characters reach up off the screen as I reply to this. They overlay the comment above you. Amazing. How?
It's usually called Zalgo text, and it's what you get when you start stacking all kinds of Unicode diacritics on poor unsuspecting characters.
https://en.wikipedia.org/wiki/Zalgo_text
This is the best generator I found: https://lingojam.com/GlitchTextGenerator
1 reply →
Interestingly I get different behaviour per browser/OS. Firefox/Linux clips it to the bounding box of the parent element, Firefox/Mac and Safari/Mac clip it to the line height, and only Chrome/Mac lets it extended further.
2 replies →
Until now, I haven't actually thought of what would happen if zalgotext occurred anywhere other than a web browser. Looking forward to the five minutes of fun with the file manager and whatnot.
768 characters is too long for macOS it seems. (References online say HFS+ has a limit of 255 UTF-16 characters. Didn't find anything for APFS immediately... edit: same for APFS)
Please don't Zalgo on HN. It's enough to speak its name.
It would be one thing if it was making other comments difficult to read or causing browser issues, but I appreciated the demonstration that both would presumably be possible on certain browsers
Honest question - what the heck are those characters?
It corrupted text or "Zalgo" text, it relies on diacritics.
See this answer on stackoverflow:
https://stackoverflow.com/questions/1732348/regex-match-open...
2 replies →
Zalgo text: https://zalgo.org/
It was a great joke for a couple weeks two internets ago.
1 reply →
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
https://en.wikipedia.org/wiki/Combining_character
1 reply →
Combining diacritic marks.
Glad you didn’t choose a sequence that crashes my browser.
regex this, bravo