Tell HN: Google doesn't work anymore for exact matches

4 years ago

It's been a while since I have felt that Google's results have deteriorated. It takes a lot of tricks to find what I am looking for. Today an interesting case occurred that frustrated me a lot and is worth telling HN.

First, I was looking for a song and searched for: "here were the dreams are born" (I know I mistyped). One of the first results I found was this interesting story (Google results https://imgur.com/a/gUq4XVZ):

https://mechahuggermr.tripod.com/id66.html

I took the following sentence from this story and used it in the readme of an internal project:

"David, we have been expecting you - this is what you have been searching for - this place, David, is where dreams are born"

Some people wanted to know where this quote came from and could not find it on Google.

I also tested and cannot enter any combination of parameters into Google to find this page. I tried quotation marks, literal search and no hyphen. Nothing, it is impossible to find it.

Does anyone know what is going on here? Can someone do a magic call and find this page on Google?

Has Google's AI/BERT Enhanced Search reached a point where indexed pages can not be found?

All results were tested with a Brazilian connection and replicated in a Private Session on an US VPN.

I agree with your general point that the search quality has gone down, quotes doesn't even always work anymore to get exact results.

Looking into your suggested example: That turned out to be interesting and unexpected.

So, the exact string you put here was "David, we have been expecting you - this is what you have been searching for - this place, David, is where dreams are born", which is what you get when you copy the text from the website. It's correct that it doesn't work on Google searching for verbatim.

The actual DOM of the snippet looks like this:

    “David, we have been <br>expecting you - this is what you have
                           been searching for - this place, <br>David, is where dreams are born.”

If you take any snippet of text that doesn't do a line-break, it seems exact searches do work, like "expecting you - this is what you have been searching for - this place" or "deep and melodious when it spoke".

If you do take a snippet that does a line-break, then it cannot find anything, like "David, we have been expecting you" or "this place, David, is where "

It seems that Google as unlearned how to treat different type of whitespaces, especially when the author/software has introduced manual line-breaks via the <br/> HTML tag.

I'm sure they have at one point introduced some "quality filter" that gives higher score based on how well the markup is made by the websites, for one reason or another, and eventually it got so "improved" or established that even if it's the only relevant hit for a human, the computer simply ignores the result for low scoring, since the markup is not 100% correct.

  • For me, 49 minutes after parent comment was posted, searching Google for the quoted phrase "expecting you - this is what you have been searching for - this place" yields exactly two results: the linked story, and _this HN page_.

  • To a company as comfortable as Google, the fact that users want to find relevant information, as opposed to watch ads all days or buy products, is an inconvenience.

    There is incentive to keep users on-site as long as possible; "our engagement metrics are rising".

    There is incentive to shovel users, kicking and screaming, to product pages or advertisements.

    Google, Youtube, Amazon and other giants have little incentive to improve search beyond "good enough that our users feel like we're trying to answer their query."

    • I have a feeling that perfect search is the holy grail that no business is interested in finding. How can it be monetized otherwise?

      Maybe 'peak good enough' is where we are at and the sooner a critical mass of people realise that google don't have a monoploy on this the better?

  • What's depressing is that this problem keeps getting worse and worse! It used to be a problem you'd encounter very occasionally, but I now experience this crap every few days, sometimes even for things that seem very obvious.

    • I see a silver lining: When Google no longer has economic incentive to deliver quality search results, new search engines can finally take a market share. For a year I've been using DuckDuckGo and Startpage, and they're both great, but sometimes there's something they just don't give me, and I've felt the need to Google it. Now that Google's results are deteriorating, I find myself doing that less. I've been able to quit Gmail, but YouTube and Search have been hard.

      "Searching" has for almost two decades been synonymous with "Going to Google and feeling lucky." -- the fact that searching requires effort (and possibly more than one search engine) feels frustrating and refreshing at the same time.

      2 replies →

  • Can someone confirm if it's also broken then for bits of text that are wrapped in inline elements? I don't have a suitable example to try to search for off hand, but for example:

        <div>
          this is the <span className="bold">best</span> day of my life
        </div>

  • Perhaps "don't attribute to cleverness something that can be explained by incompetence" applies here.

I have an Intel Realsense camera, which sometimes reports the error "Failed to recconect" (there being a typo in the drivers) [1] - that's a pretty unique error, so in combination with the product name that should be a very easy keyword search, right? Especially if I throw in some quotation marks, to make it clear I want a literal match?

Yet when I search for realsense "failed to recconect" Google, in its infinite wisdom, returns pages that contain neither realsense nor recconect [2]. They offer me a supreme court opinion, a review of a car dealership, and a facebook church service.

Correcting the spelling of a query is one thing - but also completely ignoring other keywords? Even when there are pages available that match the query? [1]

[1] https://github.com/IntelRealSense/librealsense/blob/5ff27fca... [2] https://imgur.com/a/okYV5V2

  • That is insane. I've had similarly idiotic "decisions" made by Google when searching for error codes --- things which are also similarly unique, and their uniqueness is the key to finding relevant results. Instead, searching for e.g. 1234 gives 1235, 1236, 1233, and even 12345, but no 1234. I used to use quoting but even that doesn't work now. Absolutely useless.

  • It corrected the typo, but it shows you the option to put it back.

    Click the: Search instead for realsense "failed to recconect"

    Then you get some very plausible looking results (idk how well they'll solve your problem).

    • Putting quotes or other search flags and instructions should - as a rule, interpret that literally. Anyone using them wants them to work.

      The only time it may annoy someone is if they're copying and pasting ... Quotes, as text, from some website? A plurality of quotes on Facebook are images of text.

      It's just Google's interest to keep you faffing around on their search page longer. There's no other reason - this stuff used to work!

      3 replies →

  • I don't suppose you're using an Android phone and tested this in the search bar widget? I was able to reproduce your results that way, but it did provide me with a search instead option that did lead to the relevant GitHub page. I then tried again to reproduce the test in mobile Chrome and immediately got the right result without being promoted to search instead for what I'd actually searched for.

I was looking for a specific person recently and searched: <name of person> Canada

I guess they were pretty obscure so Google in all their wisdom displayed the results for Canada, with the entire name struck through. Fantastic. Defaulting to the most generic term in a query to the point of absolute uselessness.

  • I’ve always found “lemming” ridiculous, especially in all software that copied Google despite not being generalist. “We’ve seen you are searching for ‘Phillips screw 24x17’, I won’t tell you that we don’t have any but here are results for ‘Screwdrivers’, just in case you want to use a screwdriver instead of a screw. Also here are a few Phillips TVs, in case this might help you fix your car.”

    • Product search on websites for traditional brick and mortar stores is the worst for this. I guess they weren't born with the challenge of "if customers can't find the product they want, you will die" that online-only businesses have, but still, it's not like online shopping is a new thing. And people might like to know if the store even has what they need before heading out!

  • Yep, same thing happening to me... three keywords to define what i need, and it randomly chooses to ignore one of them, and show me irrelevant results.

    The only worse search is probably on aliexpress, where you search for "red led", get a bunch of red LEDs, then you sort by the number of orders, and the top results are for other random "red" stuff (it seems as if it searches every keyword separately, but I haven't verified it).

    • I can confirm, it must be the worst possible one (but to be honest it is not like Amazon's one is much better).

      Of course not what a search should do, but I find the aliexpress search results a good source for fun and learning, through their "random" results I discovered many things I didn't know existed.

  • Just today I wondered how many people have first name Mickey and middle name Mouse. I couldn't find anything - it just goes to generic Disney websites.

I created a Google custom search engine for the text of about 40,000 out-of-copyright books: https://www.locserendipity.com/Google.html and noticed that exact text matches don’t always show up. A RegEx search of the same repository yields many more matches for specific phrases than the Google custom search does. About a year and a half ago, the quality of results went down a lot. I agree something has changed for the worse.

I recall when YouTube used to refine the "best answer" to a question based on distributed popularity. Ask it, "How do I screw in a lightbulb?" and the video at the top would be one which had received numerous views and likes, crowdsourcing its usefulness in answering the question. Then the algorithm was changed to prioritize more recent videos. This gave rise to the churn of the novel and new over the time tested and approved. This was better for ads, but worse for users. One cannot expect the ad driven model NOT to have an impact on usability.

This (shitty NLP) has been bad for a while, but I did notice it get worse recently in a way that feels crippling to me. I don't have a functional search engine anymore.

  • Does anyone have insight into _why_ google search has deteriorated so rapidly over the last ~6-12 months? Optimizing for NLP or websites learning SEO don't seem like they would have this big of an impact. Everyone seems to agree [0] [1] [2] that this is a problem yet it keeps getting worse

    https://news.ycombinator.com/item?id=29414562

    • The voice recognition has gone to shit as well, to the point where it may as well be editorializing. Apparently I'm not allowed to begin a sentence with the word "our" because no matter what pronunciation I use it becomes "how". I just don't get it. I learned my "computer voice" talking to garbage voice command systems in the early 2000s that insisted on crystal-clear speech and had absolutely no issues with Apple or Google voice typing until probably 2018. Since then it's a been a steady decline into near-unusability. I dare anyone to successfully get Google to voice-type the word "o'clock".

    • The best explanation I’ve seen is that Google only cares about the quality of the results when using completely natural language sentences (which I guess is how most non-technical people try to use it?) rather than the specialized search engine syntax with keywords, quotes, +/-, etc. we learned to use in 90s/00s.

      It may optimize for the common case, but unlike the old system, leaves you completely helpless when it fails to “Just Work.”

    • My guess is bad regression testing based on subjective qualifiers at best or incentivize poor results that promote as revenue at worst.

    • It's probably become more profitable for them to have the search be shit now that their monopoly is so secure

Curiously, searching directly on the site with that quote produces "No results found," and then shows an inexact match with just that quote underneath. This is clearly a real bug on Google's side.

https://imgur.com/a/2XFogU5

  • I may have figured it out. The site is committing hijinks with the text. They're manually wrapping text with `<br>`'s and then manually wrapping the source with spaces. Here's the HTML of the lines in question:

        <DIV>The voice was deep and melodious when it spoke. &#8220;David, we have been <BR>expecting you - this is what you have
                                   been searching for - this place, <BR>David, is where dreams are born.&#8221; It was at this moment David realized <BR>the
                                   being was speaking to him with its own voice, not by thought. David <BR>stood unmoving. He realized he had never dreamed before
                                   or even had ever <BR>slept.
    

    If you search for same-line sentence fragments you'll find the page: https://www.google.com/search?q=%22The+voice+was+deep+and+me.... Not an excuse: this is a case Google should handle.

    For posterity: https://imgur.com/a/DAUpLit

It's not just you. I have been having similar problems and it just seems to keep getting worse. The other issue I've been having is the precedence of product pages over everything else. Now it seems no matter what I search for I'm bombarded with links to products and low quality product focused blogspam. Also I've noticed search results will change based on what I have recently searched. The same search can yield different results at different times. Google screwed up the one thing it was actually good at. Now Google Search just feels like every other subpar Google knockoff product.

I totally agree with the consensus here that Google search result quality has declined dramatically, although I think it's been happening for at least 5 years rather than 6-12 months. Two alternates have helped me replace Google.

DuckDuckGo has been mentioned by many others but I'll add my voice as well. It does a much better job at respecting quotes than Google (though not always!). It's very rare these days that I strike out on a search with DDG and have any luck with Google, though it does happen on rare occasions.

The other site is SymbolHound. This is great for programming queries since it actually search symbols verbatim! I found out about it when trying to debug a complicated makefile at a new job, while not being very good at makefiles. Ever tried searching Google for some unfamiliar syntax from a makefile? Hahaha, it doesn't even try to give matching results; it's totally useless.

I miss the old days but these two alternatives, plus falling back to Google, or at least StartPage with `!sp`, still works pretty well.

Google results are low quality unless you enable verbatim. Tools, all results, verbatim.

It blows my mind this isn't the default. I can only assume they've adopted the opinion of search engines before them that they could benefit from showing lower quality results to keep the users on their site longer.

  • Is this something accessible through the Settings, bottom right? I cannot find it.

    • It doesn't seem to appear in a mobile browser, but I do see it if I put the mobile browser in desktop mode.

    • no, not in Settings. The 'Tools' option only appears at the top right of the screen after you conduct a google search in the normal fashion. So search for a term, then click Tools at top to switch to the verbatim option.

      1 reply →

As of now, searching the quote brings up this thread. I feel like Google now prioritizes certain websites (like HN) and essentially skips things like tripod websites.

  • Hasn't Google more more or less prioritized "authority" since Pagerank?

    Of course, the exact heuristics to weight authority are in a continuous flux.

I really want a search engine that works more like a database. I want to see why it returned the results it did, like an EXPLAIN syntax does for SQL queries. I want fine grain control of the actual query, and ability to sort the results by different facets.

I think there’s 100% a market for a new search engine that is geared toward the power user, the researcher.

Google hasn’t innovated their actual search engine in ways that actually benefit the user in years. The only changes they have done have been driven by a profit motive.

  • Yes. Search engines, web browser, etc should be made improved for such a purpose.

    I also would want to have the feature like EXPLAIN and to use the other features of SQL such as ORDER BY, etc. (Even on Hacker News too, to be able to change sort order, I would prefer to list the comments in order by the date/time.)

    Often, I would want to use SQL to search things that are not available as SQL, or even as CSV (which could be imported into a SQL database; SQLite command shell has this capability built-in).

  • I won't say this entirely solves the problem for researchers, but several posts here have pointed out that Duck Duck Go gives superior results in this case. I use DDG routinely, and only rarely do I need to go to Google for a search for some esoteric technical things (although I'm not certain Google actually does better).

I recently came across a video of a police officer in a high speed pursuit shooting out of his car window one handed into oncoming traffic. I remembered the title of the video I found. I remembered it happened in New Mexico and that the police officer was moved to detective since this was his 3rd gun fight in 2 years out of the departments total of 4. I was trying to pull it up to show my fiancé how nuts this cops behavior is.

No amount of searching for this video on google could find this video. 20 minutes. I was infuriated. My brain started going to dark places. Is this censorship?

No. Google sucks now. Somehow. They killed the golden goose.

I fired up Bing. Typed the title of the video and found 100 copies instantly.

I now use Bing. And it’s not bad. Generally I can find what I’m looking for. It’s like a slightly worse version of what Google used to be. But it’s an infinitely better version of what google is now.

I also use Brave search engine but that’s worse than bing still.

If you’re looking for interesting party conversation, try opening with “I recently switched my search engine to Bing.” The conversation goes to fun places and most people will agree that Google has gotten much worse.

Am I crazy or is there room for a good early 2000s-style link aggregator in today's content discovery landscape? It seems like the primary reasons the likes of delicious failed is because they sold out for money. I wonder if an open source federated solution could work.

  • I would love an alta-vista rebirth..I have yet to see a search engine that allowed you do target searches so specifically. I also loved the 'near' keyword for dealing names - "john near smith" would match "john smith", "smith john", "smith, john" etc.

    Search engines have been 'simplified' to the point they're useless if you actually want to _search_ and not 'discover' :-p

In this particular case, where the result is negative, we humans can easily recognize that something is wrong. But in other cases, where the exact match is also the best match, it’s not easy to dig through all the pages to know something is amiss. Of course, I understand that SE like Google was not designed to produce exact match for long texts like this one. But no one can deny that exact match is extremely important for finding rare/obscure stuffs. Can a search engine especially for rare, obscure stuffs (in contrast to popular search) have a niche market?

"you.com" finds a lot of things the Googopoly doesn't!! It's kinda limited, but it sometimes helps when neigher DDG nor bing does. Not shilling for them, but I tried it and it's not that bad! Lastt couple years, Googopoly scrambled Youtube search results.. The only one way for me to find these is you.com.

It used to be possible search for obscure model numbers and product codes and return only 5 results, all of which were what you're after.

Now it returns a bunch of spam and rubbish, with the model number nowhere to be found.

Sometimes you do find matches... and the results are just computer generated spam for God knows what purpose.

A friend was telling me about a model named Marie Kim.

Google has broken amy search with the name Kim as Kim Kardashian, no matter how you wildcard the search.

I feel truly sorry for anyone trying to get any traction with Kim as a name.

As others have noted, DDG magically gets this right.

Guess that's what happens when a search company becomes an ad company.

Because Google is... annoying, and silly, try verbatim search tools > verbatim, after you get search results.

  • Verbatim helps with Google silently altering your query (essentially an alternative to the now-required quoting everything) but it doesn't solve the massive spam issue that has infected Google.

    Google, as a company, feels a lot like IBM at the end of its glory days. Google won't suddenly disappear but much like IBM they will slowly shrink in relevance forever.

Since Google has source code going back to their founding, it would be nice if they literally offered older versions of their search, just with updated crawl data.

Then you could use "Google 2008" and get decent results. (Google still only needs to crawl the web once for all versions.)

Google finds it for: "David, is where dreams are born.”"

And: "The voice was deep and melodious when it spoke."

And most other things. Examine the raw HTML for that area and you might give them a pass when searching for an exact phrase that doesn't actually exist in the document itself.

  • I don’t, Google dates to 1996. Stripping white spaces/line breaks etc should be part of basic parsing. Consider someone typing in a poem or song lyrics etc a few extra <br> should be expected especially back then.

  • You are probably right. In the HTML there are some "br" line returns between each line of the citation. It can find the citation from parts of each of these lines but not from the whole citation.

Few days ago I looked for an issue about maven daemon (not reusing daemon when using java 8), all first page results where about gradle, all of them. This time DuckDuckGo did a lot better. I'm now doubting Google results reliability in general.

I was just thinking about this earlier today when I was searching for something specific and getting very irrelevant results. It used to be that the first page of results was very high quality and somewhat relevant at the bottom and page two+ was pretty much don’t even bother. Now it’s like if you are lucky the first 3-4 results might be relevant, middle is some spam with similar keywords that appear to match your search but it’ll have a funky domain e.g. .gg and either nonsense or doesn’t work due to malicious crap and being blocked. Anything after the middle is pretty much garbage.

  • and page two+ was pretty much don’t even bother

    On the other hand, I used to trawl through all of the pages Google would let me go through, and regularly found what I was looking for back there, mixed in with the usual SEO spam. Now the first page is awful, and the rest of the pages are entirely SEO spam.

I was trying to search for a specific phrase related to antenna gain (technical term) and kept getting results for "antenna gets". Even when putting the phrase in quotation marks.

For a few months now google search has been basically useless. The only thing that you get nowadays is just SEO spam and mainstream sites. At this point even bing/ddg is better.

To see how badly Google results have deteriorated, especially for politicized topics, search for "what countries are using ivermectin" with Google and then do it again with http://Yandex.com . All the Yandex results answer the question. All the Google results are either broken or are pages talking about why ivermectin should not be used.

I've had several frustrating experiences recently with search engines, Google included. The most notable: I have tried to find a video clip of Merkel joking about the German city of Bielefeld not existing. There are plenty of articles, but the clip, which I know exists, doesn't come up. Not on YT, not on Google, not on Bing. Maybe I've lost my Google-fu somewhere along the way, I don't know anymore.

I agree, but I think there are three reasons for the bad results: - Google tweaks. They target more and more the main stream users and thus neglecting the power users with their Google hacks knowledge - Content drain. More and more content is created in walled gardens, like discord, facebook, etc. - SEO. These optimized pages pollute the results

I stopped using Google entirely. I honestly feel violated every time it strips out words that I asked it to search for on the very first page. NO, I said search for this, do not do something ELSE you piece of shit.

I actually use DuckDuckGo exclusively now, not because it got better (it did a tiny bit), but because Google got so absolutely horrible that DDG is now actually better! I have the habit of trying Google if I can't find something with DuckDuckGo, but honestly I don't even know why I bother because not once has it helped since this degradation started.

I do wonder why though. I got the feeling that maybe they just gave up. Maybe they don't have to care anymore being a de facto monopoly and having so many other projects. It's hard not to think that spammers run the internet now... Ad networks run everything and then content is just generated shit spammed into results and feeds.

</rant>

  • So many bot-authored and SEO-tweaked garbage listicles and advertiser-funded “reviews” and poorly-written “TIL”/“learn from me” blogs bloated by advertising. In a few ways the web is better now than it was a few decades ago, and in very many ways it is much worse. Advertising has basically leeched almost all the value out of the web.

    • I’ve transitioned to a list of manually bookmarked sites.

      Search gets unreliable fast for rare topics. We live in an advertising based economy of poor incentives. If good paid search engines existed I would use them.

      1 reply →

  • Only problem I had with DDG is search results not displaying a create/publish date, that info is so useful to me to filtering relevant things

    • That's true but I've found in recent years that the date given is wrong. e.g. you see something listed as 2021, but it's actually from 2016 and old.

I've been looking up exact variable names from various API documentation and getting no relevant results despite these variable names appearing in multiple Stack Overflow posts, the API documentation where I copies the text from and of course GitHub.

> Can someone do a magic call and find this page on Google?

I just pasted this unquoted into Google and found the site:

> David, we have been expecting you - this is what you have been searching for - this place, David, is where dreams are born

Why would you quote this?

Web indexing and search is a constant battle between space and time, so it does not really surprise me that results for any given input may not be stable over time. Generalizing from single examples, however, is illogical.

Besides DDG, Yandex (the Russian search engine) also found the verbatim result. It reminds me of the time of meta-search engines, where they compare and combine the result of different search engines to produce a better one. Perhaps it’s time for such ideas again.

google hardly works for anything

  • It’s still not bad at getting Wikipedia links.

    • Not true. If you google some Wikipedia entry in some language let's say French, you get the result in English. Which is weird and annoying.

      I'm fluent in several languages, whenever I search some sentence in one language google makes a translation, sometimes erroneous, and only give me answers in English (or whatever language is set as default). So annoying.