← Back to context

Comment by endisneigh

2 years ago

I reckon these days search is pretty difficult and everyone knows how to game it. I recommend using a search engine that lets you effectively change which sites are shown. You can do this with Kagi, or with Google's Programmable Search Engines - I'm sure there are more too.

In particular I block Youtube, not because they aren't sometimes correct, but because I don't want videos polluting the regular results - it just takes too long to get info from videos.

An ability to upvote results for a given query seems tantalizing but I bet it would be gamed too. The DIY approach seems to be the only tractable one.

In my case I only only results from domains I believe are correct. The whitelist approach does have downsides. Usually I'll vet new potential domains through social means like Reddit and this site, rather than identifying them through the search results. I believe there's an inherent tradeoff between discoverability and the gameability of the results.

Though I do sympathize with folks who reminisce about 2008 Google Search results, there were probably orders of magnitude less content out there and a complete ignorance to how valuable your place is on your business and thus no SEO.

I also personally disagree that yt-dlp is the "correct" result for the average user when they search Youtube Download. I highly doubt the average user would know or care to use the command line. A website front end would be more actionable for them.

> In particular I block Youtube, not because they aren't sometimes correct, but because I don't want videos polluting the regular results - it just takes too long to get info from videos.

Funnily enough, lately I've been prioritizing YT videos more when searching. So many sites now are just regurgitated SEO farms with minimal quality, and easy to see why: it's minimal effort to produce and cheap to host. But making a video takes time and effort, so has a much higher barrier to use as a click farm.

More than once when traditional search failed me, I went to YT and found some video from 2009 clearly and eloquently explaining what I'm looking for in detail, and without any distractions because the person authoring the video clearly didn't specialize in the media format or show interest in experimenting.

I've found it to also be a better source when looking a product to buy. Want to know which fan to get? Turns out there's a channel from a dedicated guy who keeps finding ways to test different fans and their utility and with multiple videos demonstrating his approach and findings. The mainstream channels aren't all that useful, but there's a ton of "old web" style videos (some even recent) passionately providing details for almost anything you'd think to search. And they're a gold mine.

  • > But making a video takes time and effort, so has a much higher barrier to use as a click farm.

    > The mainstream channels aren't all that useful, but there's a ton of "old web" style videos (some even recent) passionately providing details for almost anything you'd think to search. And they're a gold mine.

    This won't be the case for long. YT is already starting to be polluted with spam and AI generated content, which will get more and more common. The same thing that happened to the web in text form, will happen to videos.

    I think the only solutions are using allowlists for specific domains, and ironically enough more AI to filter specific results. Or just straight up LLMs instead of web search, assuming they're not trained on spam data themselves.

    • Yeah. I was recently looking for videos comparing two smartphones and among top ranked videos there were videos that just show the phones side by side and the video consists of showing specs side by side and videos that just have LLM-generated text, added to the video with TTS.

    • One critical difference is the date attached to youtube videos. It's easy to verify that a video was made before this tech was available, but you can't do that with websites, or search engine result pages.

      It does limit utility for more modern needs, unfortunately.

    • Note that the problem of filtering bad data out of learning material isn’t inherently easier than filtering same out of search results.

  • Would a browser feature that skipped to the relevant parts of the video based on closed captioning and understanding search intent be useful? It seems like this would be a good way for Google to fight to stay relevant in UX vs having the chat bots just quickly spitting out a readable answer. Hunting through ad laden webpages is annoying. Seeking to the relevant section of the video is a solvable problem, especially for videos above some viewership threshold.

    • > Seeking to the relevant section of the video is a solvable problem

      ...and it has already been solved, though partially: SponsorBlock allows people to add a "Highlight" section to a video, which denotes the part of the video which the user most likely wanted to see (sans the "what's up guys", "like and subscribe", etc.)

      Of course, it's not perfect: it relies upon humans doing the work, though some may see that as a positive over something more computerized.

    • Didn’t Google try this already? It seems useful to me, at least. IMO the next frontier of search is not better hypertext, it’s podcasts, audio, and video.

  • Do you have some tips for finding concise videos that answer the question you are asking? I am finding more and more obvious LLM bullshit in results, so I am willing to try some other tactics. But I am not ready to spend the minutes watching videos to see if it is actually relevant or a waste of time, always artificially long to increase ad revenue.

    • For me, it really depends on the type of video. For fixing cars, I'm usually looking for something specific enough that there isn't a lot of chaff. It was probably recorded and edited on a phone just to splice the clips together. Probably the default thumbnail that youtube extracted from the video.

      For product videos, if Project Farm did it, look there first. Otherwise, I look for someone has a lot of videos for competing products with basically the same format, not over 10 minutes.

      Tech videos are the hardest, I often still prefer text. Maybe look for links to the docs in the description? I still get duds though.

      1 reply →

    • Wish I did, but here you're at the algorithm's mercy, unfortunately. One possibility is subbing/accruing watch time on channels that you find provide you the right value, so that the algorithm might recommend similar channels on other subject matters.

  • That's curious, I generally hate video due to inability to glance over content, and the few attempts I made to actually find useful information I searched for resulted in... spammy extra low effort video content that did not answer my questions.

    • Depends on what you’re looking for. A blog post about how to play Search and Destroy by The Stooges is not as useful as a video of James Williamson himself showing you the riffs!

      3 replies →

> Though I do sympathize with folks who reminisce about 2008 Google Search results, there were probably orders of magnitude less content out there and a complete ignorance to how valuable your place is on your business and thus no SEO.

That was a decade after Google was created and people certainly understood SEO and Google was constantly updating its algorithm to punish people who were trying to game the algorithm.

The wikipedia page on "link farming" for example references it happening as early as 1999 and targeting SEO on inktomi:

https://en.wikipedia.org/wiki/Link_farm

I remember some internal presentations at Amazon around ~2004 about how boosting Google SEO on Amazon web pages increased traffic and revenue (and Amazon was honestly a bit behind-the-curve due to a kind of NIH syndrome).

  • At the time it seemed like Google was winning, though. SEO seems to have gotten really good, or maybe Google just gave up.

I have a hard time believing it's so difficult for a search engine to distinguish between a credible, respected website that has been around a while with some generated garbage that exists to be a search result. We humans can tell them apart, so in principle, computers can too.

  • Yes, this should be table stakes for a classifier - a company with the resource of Google can definitely solve that problem if they weren't themselves in the business of spam (advertising) and benefited from spam sites (as they often include Google ads/analytics).

    • > table stakes

      Always “table stakes”. Do you think in buzzwords also? I’ve always wondered this. Or do you think normal words and then translate it into this bandwagoning / membership proving garbage ?

      1 reply →

  • I guess this brings up the question of how good are humans at doing this across a wide number of domains on average?

    The other question I have is how long do these garbage results stay up for a particular query on average?

Google's PSE is neat but there isn't a good way to manage switching between them. They could easily add a little dropdown to let you select which one to use as part of the public link UI they provide for each one individually. Giggle[1] gives me this ability and I run it locally (alongside Kagi) for more specific things to target domain lists I've been building over the years.

1. https://github.com/dan-lovelace/giggle

I'm a big fan of the non commercial site search engines because of the gaming aspect. If you're not generating revenue from the clicks the game mostly goes away.

I'm not saying people aren't entitled to make some money, but it clearly incentivizes user hostile behavior.

Maybe make it an option because legitimate sites like journalism also use this model.

  • Subscription model like Kagi seems to work pretty well against gaming the results.

    Their only remaining incentive is to be good enough that people keep paying for the service.

    • It works not because they're somehow smarter or have more resources than Google at detecting spam/SEO, it's because unlike Google (and other ad-supported search engines), they make money from result quality and have an interest in blocking spam.

      Google on the other hand makes money off ads (whether on the search results page itself or on the spam sites), so spam sites are at best considered neutral and at worst considered beneficial (since they can embed Google ads/analytics, and make the ads on the search results page look relatively good compared to the spam).

      Black-hat SEO has been around since the early days of search engines and they managed to keep it at bay just fine. What changed isn't that there was some sudden breakthrough in malicious SEO, it's that it was more profitable to keep the spammers around than to fight them, and with the entire tech industry settling on advertising/"engagement" as its business model, the risk of competition was nil because competitors with the same business model would end up making the same decision.

      The same reason is behind the neutering of advanced search features. These have nothing to do with the supposed war on spam/SEO, so why were they removed? Oh yeah because you'd spend less time on the search results page and are less likely to click on an ad/sponsored result, so it's against Google's interests and was removed too.

      4 replies →

> it just takes too long to get info from videos.

I can’t wait until video transcripts get fed into LLMs just to eliminate the whole “This video is sponsored by something-completely-unrelated, more about them later. What’s up Youtube, remember to like, share, subscribe… 5 entire minutes pass on similar drivelthe actual thing you want, but stretched out to an agonizing length

  • You need SponsorBlock.

    Usually people leave a "highlight" marker which tells you where you're supposed to jump to. Along with the regular "This video was brought to you by <insert>VPN".