Comment by steveklabnik
3 days ago
> How does this work under the hood? Does Ruby keep a giant map of all strings in the application to check new strings against to see if it can dedupe?
1. Strings have a flag (FL_FREEZE) that are set when the string is frozen. This is checked whenever a string would be mutated, to prevent it.
2. There is an interned string table for frozen strings.
> Does it keep a reference count to each unique string that requires a set lookup to update on each string instance’s deallocation?
This I am less sure about, I poked around in the implementation for a bit, but I am not sure of this answer. It appears to me that it just deletes it, but that cannot be right, I suspect I'm missing something, I only dig around in Ruby internals once or twice a year :)
There's no need for ref counting, since Ruby has a mark & sweep GC.
The interned string table uses weak references. Any string added to the interned string tables has the `FL_FSTR` flag set to it, and when a string a freed, if it has that flag the GC knowns to remove it from the interned string table.
The keyword to know to search for this in the VM is `fstring`, that's what interned strings are called internally:
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
- https://github.com/ruby/ruby/blob/b146eae3b5e9154d3fb692e8fe...
Ah, the value of FL_FSTR was what I was missing, I had followed this code into rb_gc_free_fstring without realizing what FL_FSTR meant. Thank you!