Comment by jillesvangurp
1 year ago
I think his point is that Google is a web company. And a mobile phone company. And they publish a lot of stuff in a format that's basically optimized for print and kind of useless for anything else.
I did my PhD more than 20 years ago and it was annoying then to be working with all these postscript and pdf documents. It's still annoying. These days people publish content in PDF form on websites and mostly not in printed media. People might print these or not. Twenty years ago, I definitely did. But it's weird how we stick with this. And PDFs are of course very unstructured and hard to make sense of programmatically as well.
I bet a lot of modern day scientists don't actually print the articles they read anymore and instead read them on screen or maybe on some ipad or e-reader. Print has become an edge case. Reading a pdf on a small e-reader is not ideal. Anything with columns is kind of awkward to deal with. There's a reason why most websites don't use columns: it kind of sucks as a UX. The optimal form to deliver text is in a responsive form that can adapt to any screen size where you can change the font size as well. A lot of scientific paper layouts are optimized to conserve a resource that is no longer relevant: paper real estate. Tiny fonts, multiple columns, etc.
Anyway, I like Simon's solution and how it kind of works. It's kind of funny how some of these LLMs can be so lazy. The thing with the references being omitted is hilarious. I see the same with chat gpt where it goes out of its way to never do exactly as you asked and instead just give you bits and pieces of what you ask for until you beg it to just please FFing do as you're told?! I guess they are trying to save some tokens or GPU time.
No comments yet
Contribute on Hacker News ↗