Comment by shagie
5 months ago
https://archive.org/details/stackexchange_20250930
> As of (and including) the 2025-06-30 data dump, Stack Exchange has started including watermarking/data poisoning in the data. At the time of writing, this does not appear to apply to the 2025-09-30 data dump. The format(s), the dates for affected data dumps, and by extension how the garbage data can be filtered out, are described in this community-compiled list: https://github.com/LunarWatcher/se-data-dump-transformer/blo.... If the 2025-09-30 data dump turns out to be poisoned as well, that's where an update will be added. For obvious reasons, the torrent cannot be updated once created.
No comments yet
Contribute on Hacker News ↗