← Back to context

Comment by keeda

13 hours ago

Google's advantage is not just in its index and algorithms, it is that it has built a self-reinforcing flywheel that data mines human attention at massive scale to improve their search results.

This comment (https://news.ycombinator.com/item?id=46709957) points out that Google got its start via PageRank, which essentially ranked sites based on links created by humans. As such, its primary heuristic was what humans thought was good content. Turns out, this is still how they operate.

Basically, as people search and navigate the results, Google harvests their clicks, hovers, dwell-time and other browsing behavior -- i.e. tracking what they pay attention to -- to extract critical signals to "learn" which pages the users actually found useful for the given query. This helps it rank results better and improve search overall, which keeps people coming back, which in turns gives them more queries and data, which improves their results... a never-ending flywheel.

And competitors have no hope of matching this, because if you look at the infrastructure Google has built to harvest this data, it is so much bigger than the massive index! They harvest data through Chrome, ad tracking, Android, Google Analytics, cookies (for which they built Gmail!), YouTube, Maps, and so much more. So to compete with Google Search, you don't need just a massive index, you also need the extensive web infra footprint to harvest user interactions at massive scale, meaning the most popular and widely deployed browser, mobile OS, ad footprint, analytics, email provider, maps...

This also explains why Google spends so many billions in "traffic acquisition costs" (i.e. payments for being the Search default) every year, because that is a direct driver to both, 1) ad revenue, and 2) maintaining its search quality.

This wasn't really a secret, but it turned out to be a major point in the recent Antitrust trial, which is why the proposed remedies (as TFA mentions) include the sharing of search index and "interaction data."

We all knew "if you're not paying for it, you're the product" but the fascinating thing with Google is:

- They charge advertisers to monetize our attention;

- They harvest our attention to better rank results;

- They provide better results, which keeps us coming back, and giving them even more of our attention!

Attention is all you need, indeed.

> "learn" which pages the users actually found useful for the given query

But due to their business model I'm not sure they are ranking "usefulness" as much as you think.

Useful results ultimately don't benefit Google because Google makes no money on them. Google makes money on ads - either ads on the search results page, ads on the destination pages or (indirectly) from steering users to pages which have Google Analytics.

It's likely the actual algorithm balances usefulness to the user with usefulness to Google. You don't want to serve up exclusively spam/slop as users might bounce, but you also don't want to serve up the best result because the user will prefer it over the ad on the SRP page. So it has to be a mix of both - you'll eventually get a good result, after many attempts (during which you've been exposed to ads).

Google does enjoy the myth that they are unable to combat spam/slop while in reality they do profit off it.

  • That is also the thesis of this piece: https://www.wheresyoured.at/the-men-who-killed-google/

    It is plausible, but I'd guess Google would not risk that. I'm sure Google has pulled other shenanigans to get more clicks, like stuffing more and more ads, and making ads look like results (something even I personally have fallen for once), but I think they're too smart to mess with their sacred cash cow.