Comment by meisel

3 days ago

How does this work under the hood? Does Ruby keep a giant map of all strings in the application to check new strings against to see if it can dedupe? Does it keep a reference count to each unique string that requires a set lookup to update on each string instance’s deallocation? Set lookups in a giant set can be pretty expensive!

Even if it didn't dedupe strings, mutable string literals means that it has to create a new string every time it encounters a literal in run time. If you have a literal string in a method, every time you call the method a new string is created. If you have one inside a loop, every iteration a new string is created. You get the idea.

With immutable strings literals, string literals can be reused.

  • Here’s a more concrete example:

    You make an arrow function that takes an object as input, and calls another with a string and a field from the object, for instance to populate a lookup table. You probably don’t want someone changing map keys out from under you, because you’ll break resize. So copies are being made to ensure this?

The literals would be identified at parse time.

    fooLit = "foo"
    fooVar = "f".concat("o").concat("o")

This would have fooLit be frozen at parse time. In this situation there would be "foo", "f", and "o" as frozen strings; and fooLit and fooVar would be two different strings since fooVar was created at runtime.

Creating a string that happens to be present in the frozen strings wouldn't create a new one.

  • Got it, so this could not be extended to non-literal strings

    • You can freeze strings that are created at runtime.

          irb(main):001> str = "f".concat("o").concat("o")
          => "foo"
          irb(main):002> str.frozen?
          => false
          irb(main):003> str.freeze
          => "foo"
          irb(main):004> str.frozen?
          => true
          irb(main):005> str = str.concat("bar")
          (irb):5:in 'String#concat': can't modify frozen #<Class:#<String:0x000000015807ec58>>: "foo" (FrozenError)
           from (irb):5:in '<main>'
           from <internal:kernel>:168:in 'Kernel#loop'
           from /opt/homebrew/Cellar/ruby/3.4.4/lib/ruby/gems/3.4.0/gems/irb-1.14.3/exe/irb:9:in '<top (required)>'
           from /opt/homebrew/opt/ruby/bin/irb:25:in 'Kernel#load'
           from /opt/homebrew/opt/ruby/bin/irb:25:in '<main>'

> How does this work under the hood? Does Ruby keep a giant map of all strings in the application to check new strings against to see if it can dedupe?

1. Strings have a flag (FL_FREEZE) that are set when the string is frozen. This is checked whenever a string would be mutated, to prevent it.

2. There is an interned string table for frozen strings.

> Does it keep a reference count to each unique string that requires a set lookup to update on each string instance’s deallocation?

This I am less sure about, I poked around in the implementation for a bit, but I am not sure of this answer. It appears to me that it just deletes it, but that cannot be right, I suspect I'm missing something, I only dig around in Ruby internals once or twice a year :)

The way it works in Python is that string literals are stored in a constant slot of their parent object, so at runtime the VM just returns the value at that index.

Though since Ruby already has symbols which act as immutable interned strings, frozen literals might just piggyback on that, with frozen strings being symbols under the hood.