Comment by zipy124

8 hours ago

I think in today's world the harder problem is evading SEO spam. A search engine is in constant war with adverserarial players, who need you to see their content for revenue, rather than the actual answer.

This neccessitates a constant game of cat and mouse, where you adjust your quality metric so SEO shops can't figure it out and capitalise on it.

I feel at this point you'd almost be better off hand-curating a set of domains and only crawl those.

  • not sure if this was intentional, but everything old is new again; back to OH yahoo? or Craig's list?

    • Not quite, in that you can curate domains but crawl all the urls on those domains.

      I think SEO plam + AI slop is likely to lead us back to human curation.

I wonder how hard it is when mice are not paying the cat to serve ads.

  • It sure helps, though there's still a lot of adversarial content you still need to deal with, so it's not a solved problem even if you remove the conflict of interest.

There are more kinds of search engines than just internet search engines. At this point I’m is almost certain that the non-internet search engines of the world are much larger than internet search engines.

Edit: And I’m getting downvoted for this. If it’s because I am tangential to the original comment then that’s fair. If it’s because you think I’m wrong, I have worked on the two largest internet search engines in the world and one non-internet search engine that dwarfed both in size (although different in complexity).