Comment by haizhung
2 years ago
What always confuses me about the „search has gotten so bad“ mentality is that it is often based on anecdotal evidence at best, and anecdotal recollection at worst.
Like, sure, I have the impression that search got worse over the last years, but .. has it really? How could you tell?
And, honestly, this should be a verifiable claim; you can just try the top N search terms from Google trends or whatever and see how they perform. It should be easy to make a benchmark, and yet no one (who complains about this issue) ever bothers to make one.
Dan at least started to provide actual evidence and criteria by which he would score results, but even he only looked at 5 examples. Which really is a small sample size to make any general claims.
So I am left to wonder why there are so many posts about the sentiment that search got worse without anyone ever verifying that claim.
I think the point he's trying to make that the search results page from the mainstream search engines are a minefield of scams that a regular person would have difficulty navigating safely.
If he was looking at relevance, yours would be a solid point, but since most of the emphasis is on harm, a smaller sample works. Like "we found used needles in 3 out of 5 playgrounds" doesn't typically garner requests for p-values and error bars.
I think this is a good illustration of my frustration with this discussion: I don't think search has gotten bad, I think the web has gotten bad. It's weird to even conceptualize it as a big graph of useful hypertext documents. That's just wikipedia. The broader web is this much noisier and dubious thing now.
That's bad for google though! Their model is very much predicated on the web having a lot of signal that they can find within the noise. But if it just ... doesn't actually have much signal, then what?
The web has gotten bad because of what big search engines have encouraged. If they stopped incentivizing publishing complete garbage (by ruthlessly delisting low quality sites regardless of their ad quantity, etc) then maybe we'd see a resurgence of good content.
6 replies →
But there's still plenty of signal. It isn't as if there are no working YouTube downloaders, or factually correct explanations of how transistors work. It's just that search engines don't know how to (or don't care enough about) disambiguating these good results from the mountains of spam or malware.
1 reply →
On the one hand, I'm not sure the data corroborates that. If this is a web problem and not a search engine problem, then I'd expect every search engine to have the same pattern of scam results.
I'd also argue that finding relevant results among a sea of irrelevant results is the primary function of a search engine. This was as true in 1998 as it is today. In fact, it was Google's "killer feature", unlike Altavista and the likes it showed you far more relevant results.
3 replies →
> I think the point he's trying to make that the search results page from the mainstream search engines are a minefield of scams that a regular person would have difficulty navigating safely.
Yes, and he makes the point well. It also means if you are part of the 0.49% of people who use Firefox on Android, he isn't talking about your experience. I find Firefox mobile remaining at 0.49% utterly inexplicable, which I guess just goes to show how out of touch with the mainstream I (and I assume most other people here) are.
It's not just ad blockers. My first attempt at a tyre width query got relevant results, mostly because "tyre grip" looked so bad as a search term so I used "traction" instead. In the mean time, friends of my age (60's) can't get an internet search for public toilets to return results they can understand. When I try to help them, their eyes glaze over in a short while and they wave me away in frustration. These mind games with google hold no interest for them.
I am regularly bitten with one thing he mentions: finding old results is hard, and getting harder. It makes it really hard to find historical trends ("am I wrong about what it was like back then?") really difficult.
I agree we can say "this is a minefield of scams" without doing a comparison.
There still is a question about when it got bad--I think Dan mentions 2016 as a point of comparison, and there were plenty of scams back then, so you might wonder whether the days when a query wouldn't return many scams.
If you go back far enough, then there wasn't the same kind of SEO, and Internet scams were much smaller/less organized, but that's a long time ago.
I think the automation tools for scams are what the major change is. In the distant past it was humans doing this, now I'm guessing there are a few larger businesses and likely nation states that have a point and click interface that removes 99% of the past work.
I don't think this is a fair criticism.
1) The step where you evaluate "how they perform" is necessarily subjective.
2) you could design a study and recruit participants but that isn't something a blogger is going to do.
3) He does link to polls where people agree with the idea the result have gotten worse. Yeah, there are sampling problems with a poll, but its better than nothing.
In this case especially, the writer is answering the question: "Whose results are best according to my tastes?"
What always confuses me about the „search has gotten so bad“ mentality is that it is often based on anecdotal evidence at best, and anecdotal recollection at worst.
I can't speak for anybody else, just trying to find stuff online, not writing a treatise about it or writing my own engine to outcompete Google. It's been asked many times here over the years and the answer was always explanations, never solutions.
Shittification does not happen overnight, but along many years. It started with Google deciding that some search terms weren't so popular: "did you mean...?" (forcing a second click to do what you intended to do in the first place) and went downhill when qualifiers to override that crap got ignored.
For me enough was enough when I realized that a simple query with three words, chosen carefully to point to the desired page, gave thousands of results, none of them relevant. YMMV.
Dan approached the problem from a qualitative perspective. Perhaps if more people took this approach over quantitative maximalism we would actually have products that don’t drive us fucking insane.
All that matters is the overwhelming sentiment that search has gotten worse, not the same fucking spreadsheet that got us here in the first place!
To do this you would need to have a comprehensive definition of "quality", and that's anything but easy, and it will be at least partly subjective. It's also hard to include omissions in your definition of "quality" (and again, what should or should not be omitted is subjective as well).
For example, let's say I search for "Gaza"; on one extreme end some engines might only focus on recent events, whereas others may ignore recent events and includes only general information. Is one higher "quality" than the other? Not really – it depends what you're looking for innit?
All you can really do is make a subjective list of things you find important and rate things according to that, and this is basically just the same thing as an anecdotal account but with extra steps.
Some things are easily quantifiable, but very few. Such as the number of ads per search. Back in the day google had at most 1 and it was visibly distinct from the rest of the links.
Otherwise, yeah, maybe search didn't degrade but the internet got more spammy. Or maybe users just got wiser and can see through the smoke screen better. Who knows...
Doesn't change the fact that today one has to know how to filter through pages of generic results made by low effort content farms. Results that are of dubious validity, which at best simply waste your time. Or through clones of other websites (i.e. Stackoverflow clones).
Search engines can choose to help with that (kagi certainly puts in the effort and I love it for that), or they can ignore the problem and milk you for ad clicks.
Anecdotal evidence is good enough for me.
> Dan at least started to provide actual evidence and criteria by which he would score results, but even he only looked at 5 examples. Which really is a small sample size to make any general claims.
US NIST, in their annual TREC evaluation of search systems in the scientific/academic world, use sets of 25 or 50 queries (confusingly called "topics" in the jargon).
For each, a mandated data collection is searched by retired intelligence analysts to find (almost) all relevant result, which are represented by document ID in general search and by a regular expression that matches the relevant answer for question answering (when that was evaluated, 1998-2006).
Such an approach is expensive but has the advantage of being reusable.
So you're confused why other people aren't doing research for you and when they do provide some evidence, you dismiss it because it's not a large-scale scientific inquiry into search quality? Get frickin a grip.
Every time I encounter an egregious poor result in DDG I document it with images. I have a directory of them over the last few years. However I encounter so many now, while when I first began using DDG just a couple years prior to that it was less of an issue (and I fully switched at the time). So yeah, I don't have before/after comparisons but it's a little more solid than just 'I feel the results are worse' being characterized here.
There are particular search parameters that DDG changed the behavior of, including exclusion and double quoting, which are now, according to even their own docs, more a hint of the direction results should go rather than any explicit/literal command (ime these virtually never work, which was a motivation for documenting failures, and they actually removed them from their docs temporarily at one point earlier this year).
Yes to get an accurate comparison we would need to have results from queries 10 years ago.
I still remember myself having to really often go to page 3 and more of google searches to find things even really early on.
I think it has never been good, got a bit better before SEO farms took all the gain out. That's my feeling with nothing to back it.
> So I am left to wonder why there are so many posts about the sentiment that search got worse without anyone ever verifying that claim.
I suspect it has gotten worse, so posts complaining about it resonate. But, it is not really a huge problem, and anyway it isn’t as if there’s much I can do about it, so I’m not going to bother collecting statistically valid data.
I think this is generally true about a lot of things. We should be OK with admitting that we aren’t all that data-driven and lots of our beliefs are based on anecdotes bouncing around in conversations. Lots of things are not really very important. And IMO we should better signal that our preferences and opinions aren’t facts; far too many people mix up the two from what I’ve seen.
When it comes to human psychology what we believe tends to be more important than what is when it comes to future predictions of our actions. If people think search sucks then it's likely they'll use less of it in the future and it opens up companies like Google for disruption.
Internet Archive remembers. https://web.archive.org/web/*/google.com/search/%2A
Find a query of interest, see for yourself (and take a snapshot of the present state for posterity).
The api enables more powerful queries, https://web.archive.org/cdx/search/cdx?url=google.co.jp*&pag...
Also try other search engines and languages.
Even without looking at the subjective quality of search results, the sheer user hostility of the design of the Google search results page is an obvious, objective instance of how search has enshittified.
That is, in the early days, Google used to highlight that "search position couldn't be gamed/bought" as one of their primary differentiators, ads were clearly displayed with a distinct yellow background, and there weren't that many ads. Nowadays, when I do any remotely commercial search the entire first page and a half at least on mobile is ads, and the only thing that differentiates ads from organic results is a tiny piece of "Sponsored" text.
> has it really? How could you tell?
Yes it has and for a certain class of queries it's not even open for debate, because Google themselves have stated they deliberately made it worse. And they really did, it's very noticeable.
This class of queries is for anything related to any perspective deemed "non authoritative". Try to find information that contradicts the US Government on medical questions, for example, and even when you know what page you're looking for you won't be able to find it except via the most specific forcing e.g. exact quoted substrings.
Likewise, try finding stories that are mostly covered by Breitbart on Google and you won't be able to. They suppress conservative news sites to stop them ranking.
15 years ago Google wasn't doing that. It would usually return what you were looking for regardless of topic. There are now many topics - which specifically is a secret - on which the result quality is deliberately trashed because they'd prefer to show you the wrong results in an attempt to change your mind about something, than the results you actually asked for.
Probably for the same reason that there are so many more posts about anything that make claims than that explore evidence systematically, especially when the people making the posts stand to gain nothing by spending their time that way.
I encounter claims that "protobuf is faster than json" pretty regularly but it seems like nobody has actually benchmarked this. Typical protobuf decoder benchmarks say that protobuf decodes ~5x slower than json, and I don't think it's ~5x smaller for the same document, but I'm also not dedicating my weekend to convincing other people about this.
The problem with benchmarking that claim is there's no one true "json decoder" that everyone uses. You choose one based on your language -- JSON.stringify if you're using JS, serde_json if you're using Rust, etc.
So what people are actually saying is, a typical protobuf implementation decodes faster than a typical JSON implementation for a typical serialized object -- and that's true in my experience.
Tying this back into the thread topic of search engine results, I googled "protobuf json benchmark" and the first result is this Golang benchmark which seems relevant. https://shijuvar.medium.com/benchmarking-protocol-buffers-js... Results for specific languages like "rust protobuf json benchmark" also look nice and relevant, but I'm not gonna click on all these links to verify.
In my experience programming searches tend to get much better results than other types of searches, so I think the article's claim still holds.
I agree. You wouldn't use encoding/json or serde-json if you had to deserialize a lot of json and you cared about latency, throughput, or power costs. A typical protobuf decoder would be better.
[dead]
[flagged]