← Back to context

Comment by heavenlyblue

4 years ago

> Strings are immutable, so in theory it could easily be the case that any two equal Strings must share their machine address, even if you got them from user input.

Hey, and now you have two problems: String hashing and finding all strings which are equal to each other in memory

Well, no, the whole point of this discussion is that solving the second problem means the first problem never comes up.

And this isn't exactly some exotic approach; how often do you think people write Hashes in Ruby where the keys they use are all symbols? It's so common that there's dedicated syntax for it.

  • It's as old as Lisp, but there's a reason symbols exist separately from strings - they're used differently. Strings are frequently transformed, symbols almost never are. String are frequently taken from end-user input, symbols very rarely. Strings sometimes are very large, symbol names are almost universally very short.

    The problem is, interning is an expensive operation. It means adding to an ever growing database of strings, but first checking if the string isn't already there. You don't want to do that every time you change case or flip a letter in a string, or use it to access a hash table. I'm not saying it can't be done, but I honestly have no idea how to implement sane, generic, automatic interning of strings. I feel more comfortable having a symbol type, and control over turning strings into symbols.

    • I definitely agree that uninterned strings are important. All I'm really trying to say down here is that there are many cases where you have a hash table which uses strings as keys (as an implementation detail), when (conceptually) it wants to be using symbols.

      (And on a less fundamental level, the particular Java String class is less string-like and more symbol-like than most string types, and this appears to have been done intentionally.)