As far as I can tell they trawled a big archive for sensitive information, (unsurprisingly) found some, and then didn't try to contact anyone affected before telling the world "hey, there are login credentials to be found in here".
I use this before submission and recommend others do too. If ai was in charge of arXiv Id have it integrated as an optional part of the submission process.
Paper LaTeX files often contain surprising details. When a paper lacks code, looking at latex source has become a part of my reproduction workflow. The comments often reveal non-trivial insights. Often, they reveal a simpler version of the methodology section (which for poor "novelty" purposes is purposely obscured via mathematical jargon).
I sort of understand the reasoning on why Arxiv prefers tex to pdf[1], even though I feel it's a bit much to make it mandatory to submit the original tex file if they detect a submitted pdf was produced from one. But I've never understood what the added value is in hosting the source publicly.
Though I have to admit, when I was still in academia, whenever I saw a beautiful figure or formatting in a preprint, I'd often try to take some inspiration from the source for my own work, occasionally learning a new neat trick or package.
A huge value in having authors upload the original source, is it divorces the content from the presentation (mostly). That the original sources were available was sufficient for a large majority of the corpus to be automatically rendered into HTML for easier reading on many devices: https://info.arxiv.org/about/accessible_HTML.html. I don't think it would have been as simple if they had to convert PDFs.
As far as I can tell they trawled a big archive for sensitive information, (unsurprisingly) found some, and then didn't try to contact anyone affected before telling the world "hey, there are login credentials to be found in here".
Don't forget giving it a fancy name in the hope that it'll go viral!
I am getting so tired of every vulnerability getting a cutesy pet name trying to pretend being the new Heartbleed / Spectre / Meltdown...
Beats having to remember and communicate CVE numbers
It's not like every datapoint comes with the email of the corresponding author.
Google has a great aid to reduce the attack surface: https://github.com/google-research/arxiv-latex-cleaner
I use this before submission and recommend others do too. If ai was in charge of arXiv Id have it integrated as an optional part of the submission process.
Paper LaTeX files often contain surprising details. When a paper lacks code, looking at latex source has become a part of my reproduction workflow. The comments often reveal non-trivial insights. Often, they reveal a simpler version of the methodology section (which for poor "novelty" purposes is purposely obscured via mathematical jargon).
I sort of understand the reasoning on why Arxiv prefers tex to pdf[1], even though I feel it's a bit much to make it mandatory to submit the original tex file if they detect a submitted pdf was produced from one. But I've never understood what the added value is in hosting the source publicly.
Though I have to admit, when I was still in academia, whenever I saw a beautiful figure or formatting in a preprint, I'd often try to take some inspiration from the source for my own work, occasionally learning a new neat trick or package.
1: https://info.arxiv.org/help/faq/whytex.html
A huge value in having authors upload the original source, is it divorces the content from the presentation (mostly). That the original sources were available was sufficient for a large majority of the corpus to be automatically rendered into HTML for easier reading on many devices: https://info.arxiv.org/about/accessible_HTML.html. I don't think it would have been as simple if they had to convert PDFs.