Comment by chilmers
6 hours ago
They tested for this. From the paper:
“We find little evidence of steganography in our NLAs. Meaning-preserving transformations, like shuffling bullet points, paraphrasing, or translating the explanation to French, cause only small drops in FVE, and this gap does not widen over training.”
No comments yet
Contribute on Hacker News ↗