Comment by bbor
3 days ago
1. It'd be for the scientific community (broadly-construed). Converting media that is currently completely un-indexed into plaintext and offering a suite of search features for finding content within it would be a game-changer, IMO! If you've ever done a lit review for any field other than ML, I'm guessing you know how reliant many fields are on relatively-old books and articles (read: PDFs at best, paper-only at worst) that you can basically only encounter via a) citation chains, b) following an author, or c) encyclopedias/textbooks.
2. I really don't see how this could ever lead to any kind of legal issue. You're not hosting any of the content itself, just offering a search feature for it. GoodReads doesn't need legal permission to index popular books, for example.
In general I get the sense that your comment is written from the perspective of an entrepreneur/startup mindset. I'm sure that's brought you meaning and maybe even some wealth, but it's not a universal one! Some of us are more interested in making something to advance humanity than something likely to make a profit, even if we might look silly in the process.
> I really don't see how this could ever lead to any kind of legal issue. You're not hosting any of the content itself, just offering a search feature for it.
You don't need to host copyrighted material. It's all about intent. The Pirate Bay is (imo correctly, even if I disagree with other aspects about copyright law and its enforcement) seen as a place where people go to find ways to not pay authors for their content. They never hosted a copyrighted byte but they're banned in some form (DNS, IP, domain seizures) in many countries. Proxies of TPB also, so being like an ISP for such a site is already enough, whereas nobody is ordering blocks of Comcast's IP addresses for providing access to websites with copyrighted material because they didn't have a somewhat-provable intent to provide copyright infringement
When I read the OP, I imagine this would link from the search results directly to Anna's archive and sci-hub, but I think you'd have to spin it as a general purpose search page and ideally not even mention AA was one of the sources, much less have links
(Don't get me wrong: everyone wants this except the lobby of journals that presently own the rights)
It would be a real shame if an anonymous third party that's definitely not the website operator made a Firefox add-on that illegitimately inserts these links to search results page though
> When I read the OP, I imagine this would link from the search results directly to Anna's archive and sci-hub
You could just give users ISBNs or link to the book's metadata on openlibrary[0], both of which AA's native search already does.
[0] https://openlibrary.org/
Exactly.
1. The ISBN in cleartext
2. An isbn://123123123 link
3. A link to the book on a legal library borrowing service
4. A link to buy the book on Amazon
Yeah but how does the search work, does it show a portion of the text? If it's a portion of the text isn't that also a part of the book?