Comment by dpe82

2 years ago

This is great. I often come across some HN post on a topic I am interested in and then want to go look at other posts in the same topic cluster to expand my exposure. This looks awesome for that.

I don't know if it would be useful or even work, but is it possible to let the user adjust the vector distance threshold and then apply the other sorting parameters to the results? Eg. if I want to go broader, but then sort by high score or something so I see popular posts within an expanded (but still relevant) cluster?

3 comments

dpe82

lettergram 2 years ago

Checkout https://askhn.ai

The content is ranked by how people discuss the topics and who discusses them

If you just do embeddings on posts you might miss relevant content. When people who have knowledge of AMD discuss intel and believe that content is relevant to AMD, the content will be ranked

julien040 2 years ago

I thought about an algorithm with weight adjustable by the user. Now, the API returns a field with the distance between the post and the query (the square of the Euclidean distance). It's used by the interface to rank results by relevance.

Perhaps I can compute a score for each story, where each field has a weight and rank the results using this score. For example, the score could be 0.2 x score + 0.1 x comments + 1/distance - timestamp/ 10^9. The stories with the highest rank would be shown first, and the weight (0.2, 0.1, 10^9) could be adjusted by the user, as some might prefer recency while others prefer popularity.

juliusgeo 2 years ago

It might be useful to pose this problem in terms of a precision vs. recall curve.