← Back to context

Comment by ceroxylon

1 day ago

The author claims that they tried to avoid that: "[. . .] we had to choose them carefully and experiment to ensure that these documents were not already in the LLM training data (full disclosure: we can’t know for sure, but we took every reasonable precaution)."

Even if that specific document wasn't in the training data, there could be many similar documents from others at the time.