Comment by reaperducer

4 years ago

only allow ASCII, maybe dashes, and up to twelve characters. Problem solved

...and only hire people from the exact same background as you, who will never have unusual characters or accents in their name. And also make sure not to have any users who aren't exactly like you, and conform to this very narrow requirement. Surely, excluding 90% of the world won't hurt revenue in any way.

Snarky, but I'll take it.

Use strict schema for the hardware interface, networking, physical stuff the user never sees. Microservice names don't need to be non-Latin. Database replicas, infrastructures, etc. And you're not going to piss off employees by giving them ASCII ldap/email addresses.

Use utf8mb4 or similar for storing names. Don't state "first" or "last". I've been through this rodeo too many times. You're not surprising anyone.

UTF-8 strings aren’t reproducible anyways. User ID should be strictly for identification, be alphanumeric random string if necessary.

You can use an "ASCII-fied" version of the name, only ~27% of mine can be typed in ASCII letters that look similar but the rest is just phonetically or visually close-enough letters. This is something people did for decades and nowadays even government IDs have an ASCII-fied (well, Latin-fied) version of the name.