Comment by giancarlostoro

7 months ago

This then begs the question for me, without an LLM what is the approach to build a search engine? Google search used to be razor sharp, then it degraded in the late 2000s and early 2010s and now its meh. They filter out so much content for a billion different reasons and the results are just not what they used to be. I've found better results from some LLMs like Grok (surprisingly) but I can't seem to understand why what was once a razor exact search engine like Google, it cannot find verbatim or near verbatim quotes of content I remember seeing on the internet.

My understanding was that every few months Google was forced to adjust their algorithms because the search results would get flooded by people using black hat SEO techniques. At least that's the excuse I heard for why it got so much worse over time.

Not sure if that's related to it ignoring quotes and operators though. I'd imagine that to be a cost saving measure (and very rarely used, considering it keeps accusing me of being a robot when I do...)

From what I understand, that good old Google from the 2000s was built entirely without any kind of machine learning. Just a keyword index and PageRank. Everything they added since then seems to have made it worse (though it did also degrade "organically" from the SEO spam).

  • Google certainly had to update their algorithms to cope with SEO, but that's not why their results have become so poor in the last five years or so. They made a conscious decision to prioritize profit over search quality. This came out in internal emails that were published as part of discovery for one of the antitrust suits.

    To reiterate: Google search results are shit because shit ad-laden results make them more money in the short term.

    That's it. And it's sad that so many people continue to give them the benefit of the doubt when there is no doubt.

  • The majority of the public internet shifted to "SEO optimized" garbage while the real user-generated content shifted to walled gardens like Instagram, Facebook, and Reddit (somewhat open). More recently, even use generated content is poisoned by wannabe influencers shilling some snake oil or scam.

  • This is my take as well. When websites were few, directories were awesome. When websites multiplied, Google was awesome. When websites became SEO trash, social networks were awesome. When social networks are become trash, I'm hoping the Fediverse becomes the next awesome.

    I don't see AI in any form becoming the next awesome.

    • I wish all the best wishes to fediverse too. I'd like to take this one step too that communities have gone a similar transition too from forums to mostly now discord and I wish them to move to something like matrix which is federated (yes I know it has issues, but trust me sacrifices must be made)

      What are your thoughts on things like bluesky/nostr and (matrix) too.

      Bluesky does seem centralized in its current stage but its idea of (pds?) makes it fundamentally hack proof in the sense that if you are on a server which gets hacked, then your account is still safe or atleast that's the plan, not sure about its current implementation.

      I also agree with AI not being the next awesome. Maybe for coding sure, but not in general yeah. But even in coding man, I feel like its good enough and its hard to catch more progress from now on and its just not worth it but honestly that's just me.

      3 replies →

  • This is correct. Marketing and Advertising manipulated pages to gain higher rankings because they figured out the algorithm behind it. Forcing Google to change the algorithm. Originally, prior to the flood of <meta> garbage and hidden <div>’s it was very good at linking content together. Now, it’s a weighted database.

  • This has always been the explanation, but I've always wondered if it wasn't so much battling SEO as balancing the appearance of battling SEO while not killing some factor related to their revenue.

  • That begs the question, if you can recreate their engine from the 2000s with high quality search results, would investors even fund you? Lol

When I encounter the "cannot find verbatim quote I remember" problem and then later find what I was looking for in some other way, I usually discover that I misremembered and the actual quote was different. I do prefer getting zero results in that case, though.

I wish there was an old fashioned n-gram + page rank search engine for those of us who don't mind the issues the older Google had. I've thought about making my own a few times.

The internet itself has changed over time, and a lot of content has just disappeared. It shouldn't appear in search because it's just not there anymore, it'd be a 404.

  • A search engine that kept dead entries but maybe put them in an “missing” tab or something would’ve been monstrously useful for me in so many situations. There’s been numerous times I’ve remembered looking at something N years ago only for all but the faintest traces of it to have disappeared from the internet. With a “missing” tab I’d at least have former URLs, page titles, etc to work with (archive.org, etc).