Google is currently the only search engine allowed to crawl Reddit, which sometimes yields good original user content of actual { non-blogspam, non-SEOed-to-hell, non-AI } value.
All search engines include Reddit results by default and you can usually refine by adding some param like site:reddit.com which works the same in Google as in other search engines
> “Edit: maybe you are thinking about the AI deal which is exclusive to Google. That's not the same thing as search engine indexing.”
@mastazi that’s what I’m talking about, and I think your AI vs. indexing nuance is incorrect. I wasn’t sure, so I just did a quick N=1 verification: searching for the name of a random 1week-old popular Reddit post with a precise unique title,
- Insta-found it on Goog as top result
- Didn't find it on DDG, with or without site:reddit.com
Looks like sibling comment from @cpressland (thanks!) is correct: as of today and until other search engines sign licensing agreements with Reddit, “non-Google search engines cannot get new results from Reddit”. See https://www.reddit.com/robots.txt , which links to https://support.reddithelp.com/hc/en-us/articles/26410290525... , section “Reddit may license public content for commercial or non-commercial use”
The Reddit page is in German, but as you may have noticed, the URL has `?tl=de` appended, while it contains `ssd_pool_a_bad_idea` in the path. If I remove the `?tl=de`, I get the original version, in English.
This means that what Google crawled, what it has in its index, was already in German. So Reddit translated the original page into German, then made it accessible for Google to index it.
For me this causes the problem that I am now getting a lot of AI-translated Reddit content, even though I'd really like to have the English version to begin with, because I assume that it won't contain translation errors.
I mean, the translation is very good, you probably wouldn't notice that it is one, but still...
Google is currently the only search engine allowed to crawl Reddit, which sometimes yields good original user content of actual { non-blogspam, non-SEOed-to-hell, non-AI } value.
All search engines include Reddit results by default and you can usually refine by adding some param like site:reddit.com which works the same in Google as in other search engines
e.g. https://duckduckgo.com/?t=ffab&q=filter+coffee+site%3Areddit...
Edit: maybe you are thinking about the AI deal which is exclusive to Google. That's not the same thing as search engine indexing https://www.cbsnews.com/news/google-reddit-60-million-deal-a...
While I couldn’t find a better document: https://www.tomsguide.com/computing/search-engines/google-is... describes how non-Google search engines cannot get new results from Reddit.
1 reply →
> “Edit: maybe you are thinking about the AI deal which is exclusive to Google. That's not the same thing as search engine indexing.”
@mastazi that’s what I’m talking about, and I think your AI vs. indexing nuance is incorrect. I wasn’t sure, so I just did a quick N=1 verification: searching for the name of a random 1week-old popular Reddit post with a precise unique title,
- Insta-found it on Goog as top result
- Didn't find it on DDG, with or without site:reddit.com
Looks like sibling comment from @cpressland (thanks!) is correct: as of today and until other search engines sign licensing agreements with Reddit, “non-Google search engines cannot get new results from Reddit”. See https://www.reddit.com/robots.txt , which links to https://support.reddithelp.com/hc/en-us/articles/26410290525... , section “Reddit may license public content for commercial or non-commercial use”
3 replies →
ChatGPT search isn't allowed to index reddit
> Google is currently the only search engine allowed to crawl Reddit
Stop using Reddit. Reddit is already following the path toward nothing but SEO spam and malware. Use Hacker News or Fediverse.
This partnership has created a problem, though:
My Google is set to German. Apparently Reddit has autotranslated all their content into German.
If I do a Google search for "ssd zfs pool" the 4th result is "SSD-Pool eine schlechte Idee? : r/Proxmox" and this links to `https://www.reddit.com/r/Proxmox/comments/12a5abh/ssd_pool_a...`.
The Reddit page is in German, but as you may have noticed, the URL has `?tl=de` appended, while it contains `ssd_pool_a_bad_idea` in the path. If I remove the `?tl=de`, I get the original version, in English.
This means that what Google crawled, what it has in its index, was already in German. So Reddit translated the original page into German, then made it accessible for Google to index it.
For me this causes the problem that I am now getting a lot of AI-translated Reddit content, even though I'd really like to have the English version to begin with, because I assume that it won't contain translation errors.
I mean, the translation is very good, you probably wouldn't notice that it is one, but still...
https://www.reddit.com/r/Proxmox/comments/12a5abh/ssd_pool_a...
https://www.reddit.com/r/Proxmox/comments/12a5abh/ssd_pool_a...
Add this to your uBlock Origin filters:
Funny, that is indeed true, didn't know that.