← Back to context

Comment by bbor

3 days ago

1. It'd be for the scientific community (broadly-construed). Converting media that is currently completely un-indexed into plaintext and offering a suite of search features for finding content within it would be a game-changer, IMO! If you've ever done a lit review for any field other than ML, I'm guessing you know how reliant many fields are on relatively-old books and articles (read: PDFs at best, paper-only at worst) that you can basically only encounter via a) citation chains, b) following an author, or c) encyclopedias/textbooks.

2. I really don't see how this could ever lead to any kind of legal issue. You're not hosting any of the content itself, just offering a search feature for it. GoodReads doesn't need legal permission to index popular books, for example.

In general I get the sense that your comment is written from the perspective of an entrepreneur/startup mindset. I'm sure that's brought you meaning and maybe even some wealth, but it's not a universal one! Some of us are more interested in making something to advance humanity than something likely to make a profit, even if we might look silly in the process.

> I really don't see how this could ever lead to any kind of legal issue. You're not hosting any of the content itself, just offering a search feature for it.

You don't need to host copyrighted material. It's all about intent. The Pirate Bay is (imo correctly, even if I disagree with other aspects about copyright law and its enforcement) seen as a place where people go to find ways to not pay authors for their content. They never hosted a copyrighted byte but they're banned in some form (DNS, IP, domain seizures) in many countries. Proxies of TPB also, so being like an ISP for such a site is already enough, whereas nobody is ordering blocks of Comcast's IP addresses for providing access to websites with copyrighted material because they didn't have a somewhat-provable intent to provide copyright infringement

When I read the OP, I imagine this would link from the search results directly to Anna's archive and sci-hub, but I think you'd have to spin it as a general purpose search page and ideally not even mention AA was one of the sources, much less have links

(Don't get me wrong: everyone wants this except the lobby of journals that presently own the rights)

It would be a real shame if an anonymous third party that's definitely not the website operator made a Firefox add-on that illegitimately inserts these links to search results page though

  • > When I read the OP, I imagine this would link from the search results directly to Anna's archive and sci-hub

    You could just give users ISBNs or link to the book's metadata on openlibrary[0], both of which AA's native search already does.

    [0] https://openlibrary.org/

    • Exactly.

      1. The ISBN in cleartext

      2. An isbn://123123123 link

      3. A link to the book on a legal library borrowing service

      4. A link to buy the book on Amazon

Yeah but how does the search work, does it show a portion of the text? If it's a portion of the text isn't that also a part of the book?