Comment by gxonatano
3 days ago
My name, Ĝonatano, contains a ĝ, which is an uncommon letter outside of my language, Esperanto. But when I go to set my username to "ĝonatano," I'm often told that usernames "may only contain letters or underscores," as if ĝ weren't a letter. (You can see that I've approximated it in my HN username, but I don't need to do that on web services that correctly understand that letters exist outside of ASCII and Latin-1.)
To be fair, Esperanto is, as far as I can tell, not very widely used. The letter ĝ mostly returns Esperanto results. Using that letter in a place where others may need to communicate or type the letter would be a severe burden on almost anyone else you interact with, outside of Esperanto communities.
I'm sure there are plenty of people who share your frustration with accented letters, ñ, umlauts, etc, though. I'd hope that most systems can handle those letters, although I wouldn't hold out hope that Ĝ/ĝ would be high on the priority list.
> as far as I can tell, not very widely used
Well, it's the most widely spoken international language, spoken in over a hundred countries, by an estimated 2-5M people. There's a rich literature (probably 30-50K books), vibrant music scene, and support in open source software (Linux, Firefox, Google products) is usually pretty good.
But the issue is not how widely Esperanto, or any other language, is spoken. If you assume that languages should only be supported according to their number of speakers, you leave no room for useful languages, bridge languages, auxiliary languages, or growing languages. Even if Esperanto had only 100 speakers, it'd be worthwhile to support, if it's easy to learn, and easy for non-speakers to understand.
It's not a "severe burden" to consider non-ASCII letters as letters. Unicode is pretty straightforward to work with, and if you want to support more than just English, it's a necessity. There's no need to have a "priority list" of letters you consider more or less important than others. That attitude comes across as very Anglocentric.
What is the definition of an "international language" that makes Esperanto the most widely spoken one? Isn't Arabic an international language, for instance?
5 replies →
> It's not a "severe burden" to consider non-ASCII letters as letters. [...] That attitude comes across as very Anglocentric.
Maybe I didn't communicate my thoughts clearly - the reason I call it a "severe burden" is because people won't know how to type it or how to pronounce it. I doubt many people have the ability to type the letter, and would have to copy-paste it. Even on Mac, where most diacritical characters are an opt+key away, the "ˆ" does not apply to the letter "g", resulting in "ˆg". "ĝ" would need to be treated the same as, for example, "¯\_(ツ)_/¯" - where users generally google it and then copy-paste it. Sure, there are ways to allow for easier retrieval (ex. I have "@shrug" set up to make the shrug), but most people will very rarely encounter "ĝ" or similar, and won't have a shortcut set up.
You also can't put in Cyrillic or CJK characters. It's a user name, not a human name, you should be fine just using the 26 ASCII letters for it. Basically anything that is a computer-centric string should be only ASCII and nothing else, because supporting all of human writing is a never-ending task.
It's also a dangerous one. For example, are a number of variants of "a" that are different characters in Unicode but are often indistinguishable in most fonts and/or at small font sizes: https://util.unicode.org/UnicodeJsps/confusables.jsp?a=abcde....
[dead]
When I do a Ctrl+F search for “Gonatano” one of the search results is the actual name as typed with the circumflex. I think that is kind of a handy feature of the browser I’m using but at the same time it is sort of weird since it technically is not the same name without the circumflex, right?
Also not all database systems would think the non-circumflex version is equivalent to the circumflex version. Does anyone have thoughts or ideas about how or why they should be treated equivalently?
I also recognize this can get kind of political. There was a push in California recently to let people have accented letters in their name. Apparently it is legally not allowed. And yet some people claim their California birth certificate does contain accented letters.
Postgres has a module called unaccent[0] that removes diacritics for filtering. I expect your browser is doing something similar. While not appropriate when looking for exact matches, when doing user-input based searches, this should probably be the norm, as the user may be unaware of the accents or how to input them correctly on their keyboards.
Dove deep on this years ago when implementing a filter for wines and wine regions.
[0][https://www.postgresql.org/docs/current/unaccent.html]
> but at the same time it is sort of weird since it technically is not the same name without the circumflex, right?
Assuming you have a "standard" keyboard, it's not weird at all for your browser to match the diacritic when you type the non-diacritic character since presumably the diacritic would be difficult to type. Firefox's search feature even has a [_] Match Diacritics checkbox which you can enable or disable.
This is absolutely the desired default behaviour for ctrl+F in a browser. e.g. I frequently read French, and don't normally want to have to put in accents in my search term when I'm searching text for a word containing an accent.
Firefox has a "Match Diacritics" checkbox right next to the "Match Case" box when you ctrl+F so you can configure as desired.
Are you a native speaker of Esperanto?
that would be so nice