Comment by simonw

1 year ago

Google: you are a web company. Please learn to publish your research papers as web pages.

I really wish that browsers had developed first-class support for offline web page bundles. There's no way to share a page that is guaranteed to be self-contained and not hit the network, especially if you want to use javascript. It's particularly frustrating since browsers supported offline mode as far back as the 90s; it just needed to be combined with support for loading from zipped folders.

That simple change would've largely solved the academic paper problem decades ago. It's bizarre that it still isn't a feature.

  • Mail clients kinda do that (or at least they can, if asked to). Also, why would academic papers need JS anyway? CSS and images, I can get, but beyond that there's no need for anything fancier.

  • One option her is to inline all assets - images etc - as bas64 URIs. The HTML page ends up huge but it will at least be self-contained.

    • Yes, but it's not guaranteed to be self-contained. I wouldn't want to open a random HTML file knowing that it could phone home, or that the content might break one day without me realizing. There's a practical and psychological aspect to sharing `steves_paper_2014.html` versus `steves_paper_2014.offlinesitebundle`. The latter feels safe and immutable.

      2 replies →

I expected to see some eldritch css monstrosity, but no, its just a pdf. A well formatted one, at that.

What’s your issue there?

  • Reading two column PDFs on a mobile phone sucks.

    Plus I can't use web tools, like "Read this page" in Mobile Safari.

    And copying and pasting is harder.

    And I can't link to individual sections.

    I'm honestly baffled by people who prefer PDFs for this kind of information. Are they printing them out on paper and going at them with a highlighter or something?

    • Just my personal take, but when I have to read something carefully, I find it easier to do on paper.

      For example, I recently wrote an article about taking random samples using SQL. Even though I was writing it for my blog, which is HTML, I proofread the article by rendering it as a PDF doc, printing it out, and reviewing it with a blue pen in hand.

      What surprised me is that I also found it easier to review the article on the screen when it was in PDF format. TeX just does a way better job of putting words on a page than does a web browser.

      Actually, if you want to do the comparison yourself, I'll put both versions online:

      HTML: https://blog.moertel.com/posts/2024-08-23-sampling-with-sql....

      PDF: https://blog.moertel.com/images/public_html/blog/pix-2024060...

      I don't think either version is hard to read, but if I had my choice, I'd read the PDF version. But maybe that's just me.

      Let me know which you prefer.

      4 replies →

    • Personally, it's sending it to GoodReader on a 13" iPad.

      I don't know that I'd go so far as to say I 'prefer' this, but there are a lot of PDFs out there, this works fine, and it's a nice change of pace given how much time I spend in front of a monitor / laptop screen.

    • Indeed. That’s the easiest way to show your students/professor/coworkers which are the crucial bits.

That’s not a blog post. This is an academic preprint, I imagine the format is as prescribed.

Conference papers use templates. It's not like Google can choose.

  • They can choose to publish it in both HTML and PDF.

    • Translating LaTeX to HTML is not a straightforward process, unfortunately. Many people have tried to implement automated translation systems, but nothing has really worked out yet.

      I think it's unfair to expect the research team to invest additional hours in learning how to make good websites, so to solve your problem would require hiring additional talent whose only job is to translate academic PDFs into accessible web pages. I don't think that's a bad idea, and certainly Google has the funds to do something like that, but I don't imagine they'd find it to be a good use of money. Accessibility is an afterthought for most major companies these days.

    • if you replace in an arxiv.org link with ar5iv.org it will auto translate to html if possible