Google Declaring War on the Web

1 hour ago (tante.cc)

I feel like AI has gotten to the point where the message is: If you want to make something (art/code/music/writing) you can do it for your own enjoyment, but you aren't allowed to make money from it anymore; only the large corporations can make money from content. If you do release something creative, it'll just be fed back into the machine to be copied over and over.

  • Needs to be inverted.

    Tax excess tech profits that derive from the efforts of others and use the proceeds to fund living artists.

    Vaguely analogous to levies on blank cassettes that went to offset piracy. Give the money directly to actual artists, not labels/publishers, though.

    • You’re describing a social revolution. Otherwise there is no way that leaders whose power over us corrupts them would want to put that into law.

      The cassette reference was a tax on consumers to send money upward. What you’re describing is the complete inverse.

  • I imagine it'll take a functional legal body to do this IE maybe europe, but I think there should be a legally binding set of metadata you can attach to images to specify that they must not be used for training (with real penalties if companies are caught)

Out of my countless www experiments the website made for myself turned out most enjoyable. Technically it is a blog with links, quotes, categories, tags and search. Sometimes i download all pages it links to. (tens of thousands)

Google dropped it from the index long ago. I had a fun discussion with some google folk where they kept arguing my website was designed wrong and that some pages had tomany links.

Basically, if you write an article about the largest banana companies you have to chose which to link to!

The 10 best movies article is better than the best 100. If you make a list of all the movies you've seen your page gradually turns into something really bad. Others will be punished for linking to it but only if you add the nth entry.

As the website is just for me it is clearly their loss not mine. No way im ging to consider linking a sub set of patents or research papers.

I don't understand the endgame here. Websites let Google crawl their content in exchange of traffic. If Google cuts that out completely, what incentive do websites have to not block the Google crawlers?

I understand that Google is feeling an existential threat from other AI products that provide answers directly. But they must also understand their symbiotic relationship with the web.

  • The end game is the consumer no longer leaving Google and the web becoming synonymous to Google for them. Why shop on some random website when you can have Gemini buy it for you? Why look for information on Wikipedia when… you get the idea.

    I think the coming years will be pivotal for the web. Facebook attempted a similar strategy back when their apps got traction, but they ultimately failed. Let’s hope Google fails too.

  • > If Google cuts that out completely, what incentive do websites have to not block the Google crawlers?

    Completely, yes, that destroys the incentive. But they can reduce it 80% or 90% or so, to the point that it's just barely worthwhile to allow their crawlers.

We abrogated getting traffic to our websites to Google long ago. Mostly because Google was so good at it that the alternatives became significantly less useful.

Now that Google is focusing on becoming 'self contained', so to speak, we should find a better way to drive traffic to websites. Ideally one that's not under the control of a single corporation.

Anyone miss StumbleUpon?

  • It feels strange there’s no decentralised search.

    I know this is likely to do with the nature of the problem, but that hasn’t stopped us from getting some wildly-unsuitable decentralised nonsense in the past.

    • I don't see how being decentralized helps search. Makes it quite harder if the fediverse is any indication

  • Does a move like this give more power / value to websites like reddit? A link aggregator that is organized is much more useful for finding new websites.

As a website owner I have seen major upticks in viewership myself but really it hits hard when you see an Ai summary that is wrong and your sites there. The whole Ai for everything push unfortunatly will downskill the world I fear and nothing can be done about it.

  • > downskill the world

    I feel this. I asked a developer today a question about how our product is programmed to handle something, and he just sent me a summary from the internal AI assistant they've started using.

    He used to provide really good, thoughtful answers, but now it's just copy/paste from the AI.

    • > He used to provide really good, thoughtful answers

      This hits hard. There’s a senior engineer at my job who is known for well written proposals. Today he shared a doc that had the typical AI formatting, was hard to read, and clearly not his style.

      On the other hand, if others use AI to summerize stuff, does it matter anymore?

    • I have a co-worker who does this now. He's very smart, very capable, very experienced and it's clear that he's just a frontend for Claude now. It's tragic.

These kinds of declarations rarely make sense to me because they don't seem to model the issues in the way that I see them. I have dual roles: one as a person who writes a blog (a "content producer" in our present parlance) and as a user. As a user, I want my browser user agent to act on my behalf to display web pages, and I want my search agent to extract information from numerous sources and synthesize them with appropriate sourcing.

One could argue that my content production being a hobby lets me be pretty blasé about being intermediated by a platform. That is somewhat true. If I relied upon this as a living, I would probably also conclude that actions that harm my way of living are a war on "the web", though realistically any neutral party observing must conclude that if it is a war, it's one on my kind of participation in the web - content creation for the purpose of revenue / notoriety / some other reward.

As a user, I don't actually care very much for each website and its creator. The information contained therein is useful to me, but the heterogeneity of these sites is mostly an obstacle to the information. I am much happier when my search and summarization agents are able to accurately synthesize what these websites say, in so much as such a synthesis allows me to model reality more accurately.

So I could be convinced that this change from Google makes it less likely for accurate content to be created and that I'll be misled more often. But this is a tool, and my world-model will frequently be tested by reality. If the search-and-synthesis machine fails to produce useful outcomes, I will know. And I'll have to adjust the way I treat knowledge I obtain through it so that I don't get catastrophic outcomes. But that's the same already. I don't really know that Google's search results are not planted ones calibrated to change my opinion. And I don't know that they don't collude with the Internet Archive (with whom they have a pre-existing relationship) to make it look like their constructed consensus is real.

As a user, I have to make a lot of decisions already, and having to painstakingly read search results to synthesize them myself is far less useful than using an agent. So if there is a war on the web, then I am glad to join it, on the side against the web.

I would feel more sad about this if the web wasn’t so rotten to begin with. On average, any random site is just trying to throw ads at you and harass you to subscribe and such.

  • That rot was the direct result of the ad economy that made Google all of its money. Now maybe if they hadn't done it then somebody else would have, but but they did do it, and poisoned the well we all drink from.

I guess the extra insult is that the summaries still suck. I feel like every time I google a technical question, I get something wrong which references a youtube video watched by 30 people about an unrelated subject.

> De-googlifying your mental apparatus becomes more urgent today. Find other search engines, don’t use the Chrome browser. Or wake up in a slopified AOL kind of environment where your access to information is limited to what Google’s synthetic text extruders deem relevant.

Everything is probably re-traceable fairly easily because Google Analytics is on nearly every web page.

But I understand maintaining your own source of archives, videos, documents, etc.

Sounds like a good vibe coding project actually.. to try and keep it all organized offline.

I'm not even sure this is bad anymore. The web is so overrun with SEO crap that it could probably use the cleansing that comes with Google's abandonment, Usenet-style.

Kind of curious how it would pan out, if there was a government enforced meta tag one could add to signal what the data could be used for - for example “no-ai”.

That would allow people to still let Google to access their site, but restrict its usage. Similar for open source projects on GitHub, etc.

  • The tech giants already violated existing copyright laws when scraping for AI content and faced very few consequences. So far the government has shown an inability to enforce anything.

    • So far, yeah. The courts shrugged and said it was allowed under current law.

      So the solution to that would be “change the law”.

  • > government enforced

    The thing everyone needs to ask before advocating for something "government enforced" is "what would happen if this was in the hands of a hostile government?"

    And then remember that (a) just because it's not hostile to you today, doesn't mean it won't be tomorrow, and (b) one man's "hostile" is another man's "utopia."

    • Well, when I said “I’m curious” it was true. I’m actually curious.

      So how do you think a meta noai tag would be used by a hostile government?

      It would be something the website owner set.

I'm confused how the strategy works in the long run. If fewer people are incentivized to build websites on novel topics, there will be less content in general and less training data... plus AI overview results see less ad conversions and therefor less ad revenue. Whats the long game? I get that the paradigm is changing but this seems like its not going to help them maintain their dominance.

  • Ah, that's where you're wrong. There is no long term. Investors want results now. "Later" is for the greater fools.

  • What if there is no long game? Just people at Google optimising for their current KPIs.

To me it seems either ...

A) Google will do a good job of this, people will find their summaries more useful, and the web will evolve into a more closed system that better serves its users

or ...

B) They're gated AI community will suck, and people will start using a different search engine that better serves its users.

My money isn't on A), but they do have a lot of clout so I wouldn't rule it out.

It looks like Google has taken a note out of Facebook's "lose trust" playbook.

Facebook had a huge opportunity in the post-AI world: real humans.

Instead of focusing on connections, they've been optimizing their properties for doomscrolling.

Google, similarly, has lost the plot on what made them trustworthy in the first place: navigating to citable content.

Both companies started on this trend well before AI, but this might be the final nail in their respective coffins[0].

[0]Yes they'll likely still be profitable for a long time, but the Bell Labs-esque downfall has begun (imo).

If Google stops driving traffic to websites, won't those websites stop allowing Google to crawl their pages? The pendulum might be in motion, but it seems like there should still be some natural equilibrium that it's heading to.

  • There won't be "websites" anymore, it will all just be Google. Other behemoths that generate original content (that aren't AI) like sports, news, entertainment will either be big enough to sign individual deals on pain of litigation or just force-scraped (as is happening now) by bots that are indistinguishable from human users.

  • We got to that point a while ago. Many of the major social media’s are essentially uncrawlable.

    Communities have moved from public forums to private discords. Most of the major social media’s are unviewable without an account.

I thought this was going to be about having to use your corporate approved phone to scan reCATCHA QR codes. Was just able to opt out of my first one but obviously won’t be able to forever.

Nobody is stopping you from publishing on the net.

Nobody is stopping you from blocking bot traffic.

You don't need search engines - you can just link between sites or have webrings. Like we used to, pre-2000.

Nobody is stopping you from not using ads on the net.

Nobody can force you to use non-essential cookies (and thus: a cookie-banner).

Imagine there was a war going on, and no-one was showing up.

Glad I haven’t used anything google for more than a decade. For internet searches, you can host searxng instance and use it. Other services too are self-hostable, even far better than google.

Google declared war on blogs and other content long time ago, when it used our websites to harvest data to target readers with ads accross the entire internet. We used to have (for twenty years!) medical technology website for MDs. How can we compete with short unrelated YouTube videos or other spam content that serve Google ads targeting doctors? How do you think the entire creative blogosphere of the early 2000s collapsed into nothingness?

Well, they are kind of desperate after missing both cloud and AI.

I would blame trash like Discord more though. Alternative search engines are available, but the crappy little web chat hides info inside.

  • > I would blame trash like Discord more though. Alternative search engines are available, but the crappy little web chat hides info inside.

    Well, we had the same problem with IRC. There's value to be had in not everything being discoverable in 5 seconds with a google search.

I don't know if it's Google AB-testing something, but the summaries below usual search result entries (the non-AI ones) are unbelievably bad today. Sometimes the link is a Reddit or SO post, but the summary is from a reply/answer with no vote contradicting the highest-voted ones.

It's conspiracy, but it feels like Google is actively making the usual search worse so everyone will use AI overview more.

  • Don’t worry when I track down most AI answers it is usually just some Redditor’s comment, which is quite scary when you think about it and Redditors in general.

    • But I want redditor's comments. It's almost my only use case of google now. What I'm complaining about is that google search can't even summary the right reddit comments.

It is not just about replacing search results with text blurbs generated on Alphabet premise either. They're making it so that unless you have an Android certified (Or Apple) smartphone you will not be a human being, you will be assumed to be a bot and blocked by their captchas.

  • Passkeys are a big part of this future, too. The spec has device attestation built in, so if passkeys gain traction, they could lock it down so only approved software is allowed to log in to services. If that happens, it means your ability to log in to services will be mediated by one of 3 US big tech companies. "For security," of course.

  • Honestly the bigger problem for me. I use SearXNG, but DDG is acceptable, or people like Kagi.

    But if ReCAPTCHA won't consider me human unless i have a certified phone, having search alternatives doesn't matter -- the websites themselves are just gonna block me

    • You may use an alternative search engine, but 90% won’t. If people accept the new way of searching, meaning, no longer visiting websites, there will no longer be any websites that could show you captchas.

The AI answers provide tons of source links.

At the end of the day, is it really all that different to provide a list of links, versus an answer or overview of a few paragraphs with links to lots of different higher-quality sources?

I follow those source links all the time. Not just to "check sources" but because they provide a ton more detail. And the links are usually much better than what I'll get with regular keyword search results.

> It’s about monopolizing access to information.

Not as long as there are competitors like OpenAI and Anthropic. In fact, LLM's have provided Google with stronger competition than it's ever had before. ChatGPT and Claude are doing what Bing was never able to.

  • > I follow those source links all the time.

    The vast majority of people don’t.

    We’ve gone from Only links to the source -> Mostly links to the source, with a short summary picked almost verbatim from the source -> AI summary that mangles several sources’ information together and gets top billing -> Only the AI summary with some footnotes linking to the source.

    Google has been fairly slowly been turning up the temperature of the pot and we’re only a few degrees away from a full boil. Let’s not pretend or be naive enough to think that’s not what’s happening.

  • Ask any publisher and you will get a resounding "yes, it is very different." On average they're able to attribute about a 33% decrease (globally) in traffic to google's (or others') AI answers. [1]

    You're right that there are competitors, but those competitors are doing the same thing: hoovering up content and then not giving anything back for it. There are deals in place for some of the largest publishers [2] [3], but that leaves a ton of content out in the cold. That's going to decrease the amount of content that's out there, which will decrease the quality of AI search. I don't know where that ends, but given how leveraged the economy is in AI it seems like a good idea for somebody to figure it out.

    [1] https://pressgazette.co.uk/media-audience-and-business-data/...

    [2] https://futureweek.com/a-complete-list-of-publishers-strikin...

    [3] https://digiday.com/media/a-timeline-of-the-major-deals-betw...

  • > The AI answers provide tons of source links.

    A lot of the time, the answer itself is good, but the links are spam blogs and Tiktok videos. I don't think there's a real connection between how the text is generated and what "references" are picked for it. I just searched for a math history topic and the reference was a literal TikTok video that's an advertisement for a sketchy mobile calculator app?

    So yeah, these references are boosting web content, but it has nothing to do with the high-quality sources used to train the LLMs in the first place.

  • Most people don't look at the sources even though the sources often contradict the statements.

    I've stopped using Google and find I'm not missing anything

This war was already declared a decade ago. By many interests. And victory followed.

I think though a big part of this was YouTube replaced blogs. It's a generational thing.

the cool thing, google is much like meta, the kids see it as something boomers are using. my daughter is 12, whenever I say “google it” she says “that’s very, very funny Dad, you are fun guy.” it’ll take some time until boomers are off google as well (my usage of google is probably at 30% of where it used to be) but their days of “this is where you go to ‘search’” are numbered

Welcome to the third-party internet. Unless every micro-decision you make while browsing can be stripped down, packaged into neat data points, and sold, you're not welcome here.

I've got a half thought about concept that maybe we need a concept like AMP back. I hated AMP. I'm glad it's dead. But you could use it to define things that you were at least advised that it would be shown in the google ui and carousel. I feel like we need a guarantee from the LLMs that if we provide some kind of meta data in our source material you'll honor stuff from it. Like show our advertisers so we get some revenue still from you showing our content on your LLM site.

Totally vibed version of this:

``` { "version": "https://agent-source.org/v1", "canonical_url": "https://ninjasandrobots.com/the-cone", "title": "The Real Reason Nobody Moved the Cone", "source_name": "Ninjas and Robots", "author": "Nathan Kontny", "summary": "An essay about embarrassment, public action, and why obvious fixes go undone.", "preferred_citation": "Ninjas and Robots", "source_card": { "headline": "The Real Reason Nobody Moved the Cone", "description": "People avoid obvious public actions not because they are lazy, but because being seen trying is embarrassing.", "image": "https://ninjasandrobots.com/images/cone-card.jpg", "cta": "Read the full essay" }, "allowed_excerpt": { "max_chars": 500, "preferred_excerpt": "People often avoid obvious public action because embarrassment feels more immediate than danger." }, "commercial_terms": { "ads_allowed": true, "sponsor_card_url": "https://ninjasandrobots.com/.well-known/sponsor-card.json", "licensing_contact": "hello@ninjasandrobots.com" } } ```

But something to get our original source honored better in the LLM. Maybe if one of the LLMs do this, we'd give it more loyalty? Maybe the government needs to compel this kind of behavior? No idea. It does suck though our content is just turned into AI's own tokens and we're left with a tiny "source" link if we're lucky.

  • Given that these platforms are increasing intermediating experiences between websites/companies/etc and end-users, I suspect we’ll soon see a strong push back in that direction to adopt more things like schema markup to get more control back in some sense. Things are only going to get worse though.

If it's so bad, people won't use it. If it's good, why be against it ?

You don't write post to reach the biggest amount of people, you do because you're passionate and ultimately you get people following you.

If average Joe doesn't go on your website, what's the big deal ?

I think this feature will be very useful to fight back on the optimized SEO hell that we currently have.

It is not a war on the web, but on how it was traditionally used (and abused). And that "traditional" way was shaped by google too.

As you want a cookie, i put you in a table, napking, serve you a bag of cookies and hope that you eat/find the cookie you want, while hearing my music, watching my ads, pushing you more foods that I sell and other services. And sometimes, that is the experience you are searching for. But also, many just want a cookie.

That is what a conversational and maybe agentic interface can give you. Have someone a blueberry cookie? Then it gives it to you, and also give pointers to restaurants that give a more complete experience sometimes (while others may try to scam you). It is a shortcut, but also doesn't hide you the traditional way to access that.

They are not saints, but neither are all the ones in the other side. But the new way to access the relevant information you want, in a way that you can use it, have its own value.

  • Google isn't a search company, and hasn't been ever since they bought DoubleClick. Their core business is advertising.

    They're trying to pivot into AI because they have gobs of "evidence" that the vast majority of people have been typing natural language questions into Google instead of looking for specific terms

  • Google pre 2010 was perfectly functional. No realtime search suggestions, advanced search parameters that were actually working, possibility of doing an exact string search if needed.

    The technology for indexing the web was mature enough by then, already then.

    I agree that much of the downward spiral was caused by google itself, tho.