Comment by Calavar

3 hours ago

Sure, maybe it's tricky to coerce an LLM into spitting out a near verbatim copy of prior data, but that's orthoginal to whether or not the data to create a near verbatim copy exists in the model weights.

1 comment

Calavar

D-Machine 3 hours ago

Especially since the recalls achieved in the paper are 96% (based on block largest-common substring approaches), the effort of extraction is utterly irrelevant.