Comment by meisel
3 days ago
How does this work under the hood? Does Ruby keep a giant map of all strings in the application to check new strings against to see if it can dedupe? Does it keep a reference count to each unique string that requires a set lookup to update on each string instance’s deallocation? Set lookups in a giant set can be pretty expensive!
Even if it didn't dedupe strings, mutable string literals means that it has to create a new string every time it encounters a literal in run time. If you have a literal string in a method, every time you call the method a new string is created. If you have one inside a loop, every iteration a new string is created. You get the idea.
With immutable strings literals, string literals can be reused.
Here’s a more concrete example:
You make an arrow function that takes an object as input, and calls another with a string and a field from the object, for instance to populate a lookup table. You probably don’t want someone changing map keys out from under you, because you’ll break resize. So copies are being made to ensure this?
The literals would be identified at parse time.
This would have fooLit be frozen at parse time. In this situation there would be "foo", "f", and "o" as frozen strings; and fooLit and fooVar would be two different strings since fooVar was created at runtime.
Creating a string that happens to be present in the frozen strings wouldn't create a new one.
Got it, so this could not be extended to non-literal strings
You can freeze strings that are created at runtime.
> How does this work under the hood? Does Ruby keep a giant map of all strings in the application to check new strings against to see if it can dedupe?
1. Strings have a flag (FL_FREEZE) that are set when the string is frozen. This is checked whenever a string would be mutated, to prevent it.
2. There is an interned string table for frozen strings.
> Does it keep a reference count to each unique string that requires a set lookup to update on each string instance’s deallocation?
This I am less sure about, I poked around in the implementation for a bit, but I am not sure of this answer. It appears to me that it just deletes it, but that cannot be right, I suspect I'm missing something, I only dig around in Ruby internals once or twice a year :)
There's no need for ref counting, since Ruby has a mark & sweep GC.
The interned string table uses weak references. Any string added to the interned string tables has the `FL_FSTR` flag set to it, and when a string a freed, if it has that flag the GC knowns to remove it from the interned string table.
The keyword to know to search for this in the VM is `fstring`, that's what interned strings are called internally:
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
Ah, the value of FL_FSTR was what I was missing, I had followed this code into rb_gc_free_fstring without realizing what FL_FSTR meant. Thank you!
The way it works in Python is that string literals are stored in a constant slot of their parent object, so at runtime the VM just returns the value at that index.
Though since Ruby already has symbols which act as immutable interned strings, frozen literals might just piggyback on that, with frozen strings being symbols under the hood.