Comment by xdennis

4 years ago

Looks like I'm in the minority. I always use spaces and non-ASCII characters in filenames.

In many languages it's a requirement. For example, in Romanian, there are 8 words that collide with „fata“ if you remove the diacritics (fata, fată, fața, față, făta, făță, fâța, fâță).

Given that we have to use diacritics, spaces don't seem like a big deal.

17 comments

xdennis

masklinn 4 years ago

> Given that we have to use diacritics, spaces don't seem like a big deal.

There is one big difference: CLI utilities don't usually care about diacritics (though encoding issues can throw a wrench in that), but they care a lot about spaces. So putting spaces in filenames requires properly quoting or escaping parameters, whereas diacritics does not. That makes one-off shell snippets and scripts a lot more annoying (though TBH I tend to shy away from those anyway, these days).

selfhoster11 4 years ago

So do I. I have a language, and I'm not afraid to use it. My computer should speak it just as well as I do.

cerved 4 years ago
There's a server at work that name with a non-ascii character. I've run into compatibility issues lots of times where I can't connect. I prefer to just use English with ASCII and be happy
- selfhoster11 4 years ago
  
  Server names are different. They are by and large machine-facing identifiers, whereas filenames have a 50-50 split of whether they are machine-facing, human-facing, or both. They makes their support of Unicode a much more critical (and appealing) proposition.
  
  1 reply →

yread 4 years ago

We have a few words that depend on diacritics to be unique in Czech as well - though not as bad as this example - but people just manage without. Hell, I don't even bother installing the Czech keyboard, if I REALLY need it (like in names), I just google for words that have the character and copy it

hdjjhhvvhga 4 years ago

So how did you deal with it in the 80s/90s?

PeterisP 4 years ago

Not sure about Romanian, but for many other languages people essentially came up with transliteration schemes (multiple, incompatible, ambiguous) to squeeze your language into ascii.
The resulting text was understandable by the "computer people" but not the general population who did not use the networks back then, perhaps somewhat comparable to when some time ago USA parents encountered the "SMS slang" used by their teenagers.
xdennis 4 years ago
As you would assume: use ASCII and deduce from context. Many people still do that.
That has lead to phantom diacritics: reading letters in unfamiliar words/names based on what you assume they are. For example some pronounce Chirica as Chirică because they assume someone forgot to type the breve in ă.
- apricot 4 years ago
  
  I call it the habanero trap. There is no ñ in "habanero", yet a lot of people say "habanyero", probably by analogy with "jalapeño".
octorian 4 years ago

Back in the day there were dozens of character sets that were alternatives to US-ASCII. Having once worked on an Email client, I needed to bake in a bunch of translation tables to convert stuff sent that way into UTF-8.

rob74 4 years ago

Hmmm, I thought I was fluent in Romanian (born there and lived there for 26 years), but I only know 5 of those 8 words...

theshrike79 4 years ago
According to Google Translate the first two are "girl" and the rest are "face". =)
- qayxc 4 years ago
  
  Google Translate is a horrible tool for "translating" single words or lists of unrelated words.
  Use a proper dictionary for that. The very nature of statistical models makes proper translation without context impossible for these systems, especially when uncommon words and diacritics are involved.
- xdennis 4 years ago
  
  * fata - the girl
  * fată - girl
  * fața - the face
  * față - face
  * făta - was giving birth
  * făță - a small fish, or a child who won't sit still
  * fâța - was fussing
  * fâță - variant of făță
  As you might infer from the first 4, Romanian uses postfix "the" and for singular feminine words you can't tell the difference if you use only ASCII.
xdennis 4 years ago

That doesn't seem unusual. Only the first 5 are very common.

vadfa 4 years ago

>In many languages it's a requirement. For example, in Romanian, there are 8 words that collide with „fata“ if you remove the diacritics

That is what context is for.