← Back to context

Comment by necovek

3 months ago

You bring up numbers, but you ignore the strings, another fundamental data type in all programming languages.

Without this trove of data, you can't do something as simple as length(str) or uppercase(str) — even in a CLI if you want to line text up.

So yes, this database has a big chunk that represents rarely useful data like you mention. But majority of it is still generally useful.

I may be wrong, but a cursory look at the data gave me the impression that the actual majority of that data was actually not related to dealing with commonplace string manipulations. Other than that, we probably agree.

  • The big one that's often ignored are collation tables: while there's the default in ISO 10646 IIRC, each region-language combo might have their specific overrides (imagine "ss" being sorted as a separate letter in German, and not as after "sr" and before "st", so it would be sa..., sb..., sr..., st..., ssa..., ssb... etc); and then Austrian German might have a different phonebook ordering.