Comment by djoldman
18 days ago
> _canonicalize_table = str.maketrans( "ABCDEFGHIJKLMNOPQRSTUVWXYZ_.", "abcdefghijklmnopqrstuvwxyz--", )
> ...
> value = name.translate(_canonicalize_table)
> while "--" in value:
> value = value.replace("--", "-")
translate can be wildly fast compared to some commonly used regexes or replacements.
I would expect however that a regex replacement would be much faster than your N^2 while loop.
That loop isn't N²: if there are long sequences of dashes, every iteration will cut the lengths of those sequences in half. So the loop has at most lg(N) iterations, for a O(N*lg(N)) total runtime.
It would be, if it was a common situation.
This loop handles cases like `eggtools._spam` → `eggtools-spam`, which is probably rare (I guess it’s for packages that export namespaced modules, and you probably don’t want to export _private modules; sorry in advance for non-pythonic terminology). Having more than two separator characters in a row is even more unusual.
I am curious, why not .lower().translate('_.', '--')
.lower() has to handle Unicode, right? I imagine the giant tables slow it down a bit.
It's so annoying how so many languages lack a basic "ASCII lowercase" and "ASCII uppercase" function. All the Unicode logic is not only unnecessary, but actively unwanted, when you e.g want to change the case of a hex encoded string or do normalization on some machine generated ASCII-only output.
3 replies →