← Back to context

Comment by serial_dev

3 days ago

The main barriers for me would be:

1. Why? Who would use that? What’s the problem with the other search engines? How will it be paid for?

2. Potential legal issues.

The technical barriers are at least challenging and interesting.

Providing a service with significant upfront investment needs with no product or service vision that I’ll likely to be sued for a couple of times a year, probably losing with who knows what kind of punishment… I’ll have to pass unfortunately.

1. It'd be for the scientific community (broadly-construed). Converting media that is currently completely un-indexed into plaintext and offering a suite of search features for finding content within it would be a game-changer, IMO! If you've ever done a lit review for any field other than ML, I'm guessing you know how reliant many fields are on relatively-old books and articles (read: PDFs at best, paper-only at worst) that you can basically only encounter via a) citation chains, b) following an author, or c) encyclopedias/textbooks.

2. I really don't see how this could ever lead to any kind of legal issue. You're not hosting any of the content itself, just offering a search feature for it. GoodReads doesn't need legal permission to index popular books, for example.

In general I get the sense that your comment is written from the perspective of an entrepreneur/startup mindset. I'm sure that's brought you meaning and maybe even some wealth, but it's not a universal one! Some of us are more interested in making something to advance humanity than something likely to make a profit, even if we might look silly in the process.

  • > I really don't see how this could ever lead to any kind of legal issue. You're not hosting any of the content itself, just offering a search feature for it.

    You don't need to host copyrighted material. It's all about intent. The Pirate Bay is (imo correctly, even if I disagree with other aspects about copyright law and its enforcement) seen as a place where people go to find ways to not pay authors for their content. They never hosted a copyrighted byte but they're banned in some form (DNS, IP, domain seizures) in many countries. Proxies of TPB also, so being like an ISP for such a site is already enough, whereas nobody is ordering blocks of Comcast's IP addresses for providing access to websites with copyrighted material because they didn't have a somewhat-provable intent to provide copyright infringement

    When I read the OP, I imagine this would link from the search results directly to Anna's archive and sci-hub, but I think you'd have to spin it as a general purpose search page and ideally not even mention AA was one of the sources, much less have links

    (Don't get me wrong: everyone wants this except the lobby of journals that presently own the rights)

    It would be a real shame if an anonymous third party that's definitely not the website operator made a Firefox add-on that illegitimately inserts these links to search results page though

    • > When I read the OP, I imagine this would link from the search results directly to Anna's archive and sci-hub

      You could just give users ISBNs or link to the book's metadata on openlibrary[0], both of which AA's native search already does.

      [0] https://openlibrary.org/

      1 reply →

  • Yeah but how does the search work, does it show a portion of the text? If it's a portion of the text isn't that also a part of the book?

But he did not mention anything about creating a "service"

It could be his own copy for personal use

What if computers continue to become faster and storage continues to become cheaper; what if "large" amounts data continue to become more manageable

The data might seem large today, but it might not seem large or unmanageable in the future

It would be incredible for LLMs. Searching it, using it as training data, etc. Would probably have to be done in Russia or some other country that doesn't respect international copyright though.

> 1. Why? Who would use that?

Rather who would use a traditional search engine instead of a book search engine, when the quality of the results from the latter will be much superior?

People who need or want the highest quality information available will pay for it. I'd easily pay for it.