Show HN: My recommendation engine for Hacker News

3 years ago (hn-recommend.julienc.me)

Hi! I’m Julien and I built a recommendation engine for Hacker News.

I feel like this website is a gold mine. Every day, I find some very interesting stories about a topic. And sometimes, I want to find other stories covering that same topic but I can’t.

Hacker News has years of history of awesome discussion and ressources. Unfortunately, I think HN Algolia isn’t helpful in searching these old threads. As a student, I want to learn a lot from this website.

This is why I created HN Recommend. Input a sentence or the URL of an article, and get the most popular and similar posts from Hacker News.

About the technical details, I've computed the embeddings of over 100,000 articles from HN and indexed it using Faiss. I made a blog post for a deeper explanation.

Source code: https://github.com/julien040/hn-recommendation-api

Article: https://julienc.me/articles/Extract_embeddings_Hacker_News_a...

Project: https://hn-recommend.julienc.me

58 comments

julien040

samwho 3 years ago

Aww, thank you for using my Memory Allocation post as the placeholder text. <3

FieryTransition 3 years ago

I often wish I could sort Hacker News into two categories. Actual software/tech/STEM and everything else. I think both are interesting, but often, the niche tech stuff gets drowned out fast. So this is great for that :-)

julien040 3 years ago

I just released a new update, thanks to everyone's feedback. Now, you can sort results by relevancy, age, or score using the select.

moritzwarhier 3 years ago

This is a joy to use and also ot fits very nicely with the other highly ranked post by Nielsen group. Kudos!

febed 3 years ago
Which other post? There is so much churn on HN that it’s hard to know which post you are referring to
- SkyArrow 3 years ago
  
  Likely https://news.ycombinator.com/item?id=36394569
  
  1 reply →

dpe82 3 years ago

This is great. I often come across some HN post on a topic I am interested in and then want to go look at other posts in the same topic cluster to expand my exposure. This looks awesome for that.

I don't know if it would be useful or even work, but is it possible to let the user adjust the vector distance threshold and then apply the other sorting parameters to the results? Eg. if I want to go broader, but then sort by high score or something so I see popular posts within an expanded (but still relevant) cluster?

lettergram 3 years ago

Checkout https://askhn.ai
The content is ranked by how people discuss the topics and who discusses them
If you just do embeddings on posts you might miss relevant content. When people who have knowledge of AMD discuss intel and believe that content is relevant to AMD, the content will be ranked
julien040 3 years ago
I thought about an algorithm with weight adjustable by the user. Now, the API returns a field with the distance between the post and the query (the square of the Euclidean distance). It's used by the interface to rank results by relevance.
Perhaps I can compute a score for each story, where each field has a weight and rank the results using this score. For example, the score could be 0.2 x score + 0.1 x comments + 1/distance - timestamp/ 10^9. The stories with the highest rank would be shown first, and the weight (0.2, 0.1, 10^9) could be adjusted by the user, as some might prefer recency while others prefer popularity.
- juliusgeo 3 years ago
  
  It might be useful to pose this problem in terms of a precision vs. recall curve.

benzible 3 years ago

Hmm I tried searching "elixir" and found nothing related to the language. HN Algolia gives me exactly what I want. On what basis do you say it's "not helpful"?

julien040 3 years ago
Yes the search doesn't work very well for one word. Try to input an url about elixir like this: https://hn-recommend.julienc.me/?q=https%3A%2F%2Fnews.ycombi...
I may have used the incorrect term. HN Algolia is effective for searching for a particular story. However, I am unable to utilize it to find related posts on the same topic that do not contain the same words.
- Solvency 3 years ago
  
  Out of curiosity related to the word vectorization algorithm...why does one word not perform as well? Whats the cause/rationale?
  
  1 reply →

danvayn 3 years ago

hey Julien. I love the product but the search doesn't seem to be doing the best for me. For example, I looked up Tailwind and got plenty of results but none of them actually involved Tailwind.

Maybe a tagging solution is the way? if you determine a set amount of popular keywords for a topic and filter around those, you can offer more relevant results. With some sort of public tagging system you can also have SEO friendly pages around tags and get people browsing stuff they wouldn't normally search for.

julien040 3 years ago

At first, the website concept focused on getting posts similar to a URL. Querying with text didn't yield relevant results.
Your solution appears better suited for this use case. Thank you.

wseqyrku 3 years ago

What I really need for HN (and any other news feed for that matter) is something like "google discover" i.e. a content-based recommendation system with some sort of feedback mechanism.

So I would get relevant information to me (I can skip, visit, like, dislike) whether or not it's popular. That last point is important because HN home page doesn't give you that, and most of posts could get lost in oblivion just because the first few folks did not find it interesting.

akomtu 3 years ago

HN needs a simple feature: a weekly digest view that shows the top 30 most commented posts (it should completely ignore flags and votes).

ColinWright 3 years ago
You mean like the one that's emailed to me every week?
https://hackernewsletter.com
- balder1991 3 years ago
  
  Thanks, I was considering something like this as I used ITTT to send me weekly top threads from certain subreddits, but now with Reddit going south…

dxbydt 3 years ago

Pls sort by recency. Otherwise you see 13 year old articles most of them obsolete/irrelevant to the current situation.

julien040 3 years ago
By sorting by recency, I was worried I would get less revelant results. Perhaps I should add a thresold to not have too old posts
- julien040 3 years ago
  
  You can now sort by recency. I hope this helps.
  
  1 reply →

RileyJames 3 years ago

Love it.

This response is very reactive heavy, where as it’s elixir I’m more interested in.

But well done on the execution. It does exactly what it states.

I’ve bookmarked.

I often search HN for additional articles and discussions based on something I’ve just read. Next time I’ll use this tool.

fewald_net 3 years ago

Great project. I learned about the faiss library. Out of curiousity, did you also try it with doc2vec?

julien040 3 years ago
I didn't try Doc2Vec. I wanted a hosted solution because I wouldn't have been able to compute all this locally (more than 100,000 posts).
If you tried it, did you have great results with? I may use it in future projects.
- fewald_net 3 years ago
  
  Yes, I am using it on a not so small dataset (roughly 1 million docs) and the output is a fairly efficient model. I am using gensim with pre-trained word vectors. New docs can be inferred via .infer_vector().
  Overall my approach is less automated than what I have seen in your codebase so it’s likely a bigger investment. I am happy to share more.
  
  1 reply →
- jimmySixDOF 3 years ago
  
  The blog post link on GitHub was a nice walk through of your method and I was interested in what you think the hit rate was for getting successful text for embeddings from TFA links. 100K is a good sized corpus but wondering how many got skipped due to paywalls or 404 links or any other problems ?
  
  1 reply →

sogen 3 years ago

A comment about search results: "design system" is related to design, "system design" relates to computing

It seems search takes the two inputs as the same.

Also, search doesn't seem to work when using just 1 word.

julien040 3 years ago
Yes it's an issue. Sadly, I can't fix it. I'm using the closed source "text-embedding-ada-002" model from OpenAI.
As I can see, the longer the input, the more accurate the results. Perhaps you can try something longer, like "What is a design system for UI?"
- sogen 3 years ago
  
  Yes, adding context helps.
  Thanks!

sukki07 3 years ago

This is amazing, thank you for this. Makes finding stuff a lot easier

swyx 3 years ago

i like the idea of this but wont remember it because my muscle memory is tuned to news.ycombinator.com. perhaps i can recommend a chrome extension instead of a website?

julien040 3 years ago
Thank you for suggesting this.
The API is already made and can be found at https://github.com/julien040/hn-recommendation-api. I don't think it would be too difficult to build a Chrome extension that fetches it.
- TechBro8615 3 years ago
  
  An iOS share widget would be cool too. Since you support putting the input text in the URL, then maybe someone can make a Workflow for it and share it here.
  
  4 replies →

2h 3 years ago

This URL fails

https://hn-recommend.julienc.me/?q=Go

julien040 3 years ago

Oops, on the API side, there is a check to ensure the text is long enough (5 characters), but I forgot to add this check client-side. Thank you for pointing out the issue.
Try this https://hn-recommend.julienc.me/?q=Golang if you want stories related to Go.
Edit: add link

rjrobben 3 years ago

i didn't expect the embeddings have such simple yet useful application, thanks!

passion__desire 3 years ago

One feature I would like for an Recommender Systems to have is : explicit ability to jump in and out of filter bubbles or research rabbit holes. Another example would be, put yourself in the shoes of another, e.g. what content is liked by game developers generally. apart from general gamedev content, what do they like, where do they take inspiration from, etc.

I remember there was a project built on instagram which allowed a person to view instagram as it looked like to a particular celebrity.

julien040 3 years ago

I'm a bit divided on this feature. On one hand, I would like to have this feature; it would be awesome to see the recommendation of people from different jobs. On the other hand, I'm a bit concerned about privacy. The system must ensure that each group is big enough to avoid the leak of someone's recommendations. I don't want anyone to know exactly what I'm liking and what I'm watching.
If I recall correctly, myCANAL (the French Netflix) used to have a similar feature. You could access the recommendations of personalities of the channel, but it was curated manually.

4hEn 3 years ago

I search for a url I know was posted and it doesn't show it. It shows unrelated articles.

julien040 3 years ago
The data is a few weeks old. Do you know when the URL was published?
- 4hEn 3 years ago
  
  It's 10 years old.
  This search query https://hn-recommend.julienc.me/?q=paul%20graham returns articles that are missing both words of the query
  
  1 reply →

rounakdatta 3 years ago

Nit:

> Resources to learn about distributed systems

I thought Murat Buffalo's blog would come up at the top. That's a gold, and I'm confident that it was shared on HN as well (maybe a year or two back).

Otherwise neat and useful!

balder1991 3 years ago

The layout is currently buggy on Firefox.

julien040 3 years ago
Hi, are you talking about a problem like this one? https://cln.sh/MFG3DPZn+
- balder1991 3 years ago
  
  Yeah, when there’s no thumbnail.
  
  1 reply →

lfkdev 3 years ago

A time filter is needed