← Back to context

Comment by ChuckMcM

3 days ago

That's the correct answer, IBM wanted the crawler mostly to feed Watson. Building a full search engine (crawler, indexer, ranker, API, web application) for the English language was a hell of an accomplishment but by the time Blekko was acquired Google was paying out tens of billions of dollars to people to send them and only them their search queries. For a service that nominally has to live on advertising revenue getting humans to use it was the only way to be net profitable, and you can't spend billions buying traffic and hope to make it back on advertising as the #3 search engine in the English speaking markets.

There are other ways to monetize search (look at Kagi for example) than advertising. Blekko missed that window though. (too early, Google needed to get a crappy as it is today to make the value of a spam free search engine desirable)

Blekko was gone by the time I learned about it. Recently (past few years) I emailed someone who worked on Blekko to get his opinion on a search engine concept I still have yet to start. His advice was to not bother competing with Google (obviously) LOL!

I don’t know if anyone’s embarked on a P2P search engine but that’s essentially my concept. Anyhoo, thanks for the inspiration!

  • Peer to peer would be tough, you really need a 10G network connection to some tier 1 provider, and about 2500 machines to distribute the crawling/serving load. (that is if you want to do a full stack search engine). And while you can run that infrastructure for on the order of $100K/month (not counting depreciation) that means you need roughly $5K/day in revenue from that cluster. At $10 RPM ($10 revenue per thousand queries) you're looking at a minimum of 500,000 'real' search queries during 'English time' (roughly 7AM to 11PM GMT). That's 31,250 queries per hour or ~9 queries per second (average).

    And that just pays to keep the lights on at the colocation center. If you're paying off the development costs (30 - 50 developers over 2 - 3 years) and the cost of an office somewhere. You'll want at least double that revenue or you'll go broke before you break even.

    Ideally you are the 'go to' place for people looking to buy something as those queries make money. People researching Douglas Fairbanks for a high school essay consume queries but don't generate ad revenue.

    It isn't for the faint of heart.

    • When you don't know what you don't know...wow.

      I know "search is hard" in the general sense but context is lacking (not a lot of details online from ex-search teams). It's always been apparent to me that you must have some other high-grossing product if you want to get into search or video, if only to pay for the servers.

      Thank you for providing your context!

  • Darknet Lantern is a decentralized searchable directory. It's probably not going to take off, but it could inspire something else. Servers spider other servers with the same software, and synchronized their data.

    • Yup, directory services are a lot easier to do peer-to-peer. Pinboard.in is a good shared directory (sort of Yahoo! without the editorial). They can yield excellent quality when you're searching for something that someone has 'indexed' with them, but poor recall when it comes to the set of all possible answers.

      Doing it peer to peer without editorial allows sites to 'get into' the index easily which has its own plusses and minuses.

Not my Q but thanks for the interesting history.

Also, (for other readers), I'm a huge fan of Kagi. Highly recommended.