Comment by ceroxylon

3 months ago

The author claims that they tried to avoid that: "[. . .] we had to choose them carefully and experiment to ensure that these documents were not already in the LLM training data (full disclosure: we can’t know for sure, but we took every reasonable precaution)."

1 comment

ceroxylon

blharr 3 months ago

Even if that specific document wasn't in the training data, there could be many similar documents from others at the time.