← Back to context

Comment by kazinator

1 day ago

It's perfectly fine to have mutable strings in a hash table; just document that the behavior becomes unspecified if keys are mutated while they are in the table.

Make sure the behavior is safe: it won't crash or be exploitable by a remote attacker.

It works especially well in a language that doesn't emphasize mutation; i.e. you don't reach for string mutation as your go-to tool for manipulation.

Explicit "freeze" stuff is an awful thing to foist onto the programmer.

> just document that the behavior becomes unspecified if keys are mutated while they are in the table.

> Make sure the behavior is safe: it won't crash or be exploitable by a remote attacker.

There is no such thing as unspecified but safe behaviour. Developers who can't predict what will happen will make invalid assumptions which will lead to security vulnerabilities when they are violated.

  • You can predict unspecified behavior: it gives a range of possibilities which do not include failures like termination, or data corruption.

    The order of evaluation of function arguments in C is unspecified, so every time any function whatsoever is called which has two or more arguments, there is unspecified behavior.

    Same in Scheme!

    A security flaw can be caused by a bug that is built on nothing but 100% specified constructs.

    The construct with unspecified behavior won't in and of itself cause a security problem. The programmer believing that a particular behavior will occur, whereas a different one occurs, can cause a bug.

    The unspecified behaviors of a hash table in the face of modified keys can be spelled out in some detail.

    Example requirements:

    "If a key present in a hash table is modified to an unequal value, it is unspecified whether the entry can be found using the new key; in any case, the entry cannot be found using the old key. If a key present in a hash table is modified to be equal to another key also present in the same hash table, it is unspecified which entry is found using that key. Modification of a key doesn't prevent that key's entry from being visited during a traversal of the hash."

    • > The order of evaluation of function arguments in C is unspecified, so every time any function whatsoever is called which has two or more arguments, there is unspecified behavior.

      Yes, and that's bad! Subsequent languages like Java learned from this mistake.

      > A security flaw can be caused by a bug that is built on nothing but 100% specified constructs.

      Of course. But it's less common.

      > The programmer believing that a particular behavior will occur, whereas a different one occurs, can cause a bug.

      And unspecified behaviour is a major cause of this! Something like Hyrum's Law applies; programmers often believe that a thing will behave the way it did when they tested it.

      > The unspecified behaviors of a hash table in the face of modified keys can be spelled out in some detail.

      That is to say, specified :P. The more you narrow the scope of what is unspecified, the better, yes; and narrowing it to nothing at all is best.

      2 replies →

In general, Ruby does allow mutable values in hash tables, with basically those semantics: https://docs.ruby-lang.org/en/3.4/Hash.html#class-Hash-label...

The copy-and-freeze behavior is a special case that applies only to strings, presumably because the alternative was too much of a footgun since programmers usually think of strings in terms of value semantics.

I don't think anyone likes the explicit .freeze calls everywhere; I think the case for frozen strings in Ruby is primarily based on performance rather than correctness (which is why it wasn't obvious earlier in the language's history that it was the right call), and the reason it's hard to make the default is because of compatibility.

  • > since programmers usually think of strings in terms of value semantics.

    Can you blame them, when you out of your way to immerse strings in the stateful OOP paradigm, with idioms like "foo".upcase!

    If you give programmers mainly a functional library for string manipulations that returns new values, then that's what they will use.