← Back to context

Comment by upboundspiral

9 days ago

I feel like a government funded search engine would resolve a lot of the issues with the monetized web.

The purpose of a search engine is to display links to web pages, not the entire content. As such, it can be argued it falls under fair use. It provides value to the people searching for content and those providing it.

However we left such a crucially important public utility in the hands of private companies, that changed their algorythms many times in order to maximize their profits and not the public good.

I think there needs to be real competition, and I am increasingly becoming certain that the government should be part of that competition. Both "private" companies and "public" governement are biased, but are biased in different ways, and I think there is real value to be created in this clash. It makes it easier for individuals to pick and choose the best option for themselves, and for third independent options to be developed.

The current cycle of knowledge generation is academia doing foundational research -> private companies expanding this research and monetizing it -> nothing. If the last step was expanded to the government providing a barebones but useable service to commodotize it, years after private companies have been able to reap immense profits, then the capabilities of the entire society are increased. If the last step is prevented, then the ruling companies turn to rentseeking and sitting on their lawrels, turn from innovating to extracting.

> However we left such a crucially important public utility in the hands of private companies, that changed their algorythms many times in order to maximize their profits and not the public good.

No one "left" a crucially important public utility in the hands of private companies. Private companies developed the search engine themselves in the late 90s in the course of doing for-profit business; and because some of them ended up being successful (most notably Google), most people using the internet today take the availability of search engines for granted.

  • Rather famously in at least the case of Google and others, with government funding:

    "Google’s true origin partly lies in CIA and NSA research grants for mass surveillance" (January 28, 2025)

    The intelligence community hoped that the nation’s leading computer scientists could take non-classified information and user data, combine it with what would become known as the internet, and begin to create for-profit, commercial enterprises to suit the needs of both the intelligence community and the public. They hoped to direct the supercomputing revolution from the start in order to make sense of what millions of human beings did inside this digital information network. That collaboration has made a comprehensive public-private mass surveillance state possible today.

    The Massive Digital Data Systems (MDDS) ... program's stated aim was to provide more than a dozen grants of several million dollars each to advance this research concept. The grants were to be directed largely through the NSF so that the most promising, successful efforts could be captured as intellectual property and form the basis of companies attracting investments from Silicon Valley. This type of public-to-private innovation system helped launch powerful science and technology companies like Qualcomm $QCOM +1.61%, Symantec, Netscape, and others.

    <https://qz.com/1145669/googles-true-origin-partly-lies-in-ci...>

    The Internet itself (particularly its precursor, ARPANET), was also government funded, as was development of the World Wide Web (CERN). Oracle, the database company, grew out of the CIA's Project Oracle.

    CIA Reading Room Project Oracle

    <https://www.cia.gov/readingroom/document/cia-rdp80-01794r000...>

    "Oracle's coziness with government goes back to its founding / Firm's growth sustained as niche established with federal, state agencies" (2002)

    <https://www.sfgate.com/bayarea/article/oracle-s-coziness-wit...>

    Surveillance has been baked in since their founding.

The government having the power to curate access to information seems bad. You could try to separate it as an independent agency, but as the current US administration is showing, that’s not really a thing.

  • The idea is that the government is biased towards hiding certain information and private companies are biased towards hiding a different set.

    While unlikely, the ideal would be for the government to provide a foundational open search infrastructure that would allow people to build on it and expand it to fit their needs in a way that is hard to do when a private companies eschews competition and hides its techniques.

    Perhaps it would be better for there to be a sanctioned crawler funded by the government, that then sells the unfiltered information to third parties like google. This would ensure IP rights are protected while ensuring open access to information.

  • And in a world where running a Google-like search engine is just one of the many jobs the US federal government has, why shouldn't how the government runs that search engine be a national-level political question decided by elections, just like the management of all the other things the US federal government does is? Regardless of how the government curated access to information, a huge chunk of the US electorate would be mad about how they were doing it, reflecting very real polarization among the population.

I'm feeling it. Addressing the other reply: zero moderation or curation, and zero shielding from the crawler, if what you've posted is on a public network. Yes, users will be able to access anything they can think of. And the government will know. I think you don't have to worry about them censoring content; they'll be perfectly happy to know who's searching for CSAM or bomb-making materials. And if people have an issue with what the government does with this information (for example, charging people who search for things the Tangerine-in-Chief doesn't want you to see), you stop it at the point of prosecution, not data access. (This does only work in a society with a functioning democracy... but free information access is also what enables that. As Americans, with our red-hot American blood, do we dare?)

We can start by forcing sites to treat crawlers equally. Google's main moat is less physical infrastructure or the algorithms, and more that sites allow only Google to scrape and index them.

They can charge money for access or disallow all scrapers, but it should not be allowed to selectively allow only Google.

  • It's not like only allowing Google actually means that only Google is allowed forever. Crawlers are free to make agreements with sites to allow themselves to crawl easier or pretend they are a regular user to bypass whatever block they are trying to do.

The same should apply to LLMs. If you're going to train on the sum total of all of humanity's creative work, from the beginning of history into perpetuity, and train on the sum total of all current intellectual property, the result should exist for the public's education, research and benefit.

It would also be in the spirit of the fair use doctrine's first and fourth considerations:

> 1. the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

> 2. the nature of the copyrighted work;

> 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

> 4. the effect of the use upon the potential market for or value of the copyrighted work.

If that doesn't happen, increasing amounts information and human creativity will be siloed and never made publicly accessible in a way that it can be consumed and reproduced as slop.