Comment by vessenes
4 days ago
OK, I just added books until you told me I had too many. Fun idea! I have a couple of suggestions:
* UI - once someone clicks "Add" you really should remove that item from the suggested list - it's very confusing to still see it.
* Beam search / diversification -- Your system threw like 100 books at me of which I'd read 95 and heard of 2 of the other 3, so it worked for me as a predictor of what I'd read, but not so well for discovery.
I'd be interested in recommendations that pushed me into a new area, or gave me a surprising read. This is easier to do if you have a fairly complete list of what someone's read, I know. But off the top of my head, I'm imagining finding my eigenfriends, then finding books that are either controversial (very wide rating differences amongst my fellow readers) or possibly ghettoized, that is, some portion of similar readers also read this X or Y subject, but not all.
Anyway, thanks, this is fun! Hook up a VLM and let people take pictures of their bookshelf next.
(From the site) >If you visit the "intersect" page, you can input multiple books and find the set of users that have read all of those books. This can be useful for finding longer tail books that weren't popular enough to meet the threshold. For instance, if you like reading about the collapse of the Soviet Union, you could put in "Lenin's Tomb" and "Secondhand Time", and see what other books the resultant users have read.
This is how filmaffinity works, which is the best recommendation system I've tried. They have a group of several dozen 'soulmates', which are users with the most similar set of films seen and ratings given; recommendations are other stuff they also liked, and you get direct access to their lists.
>then finding books that are either controversial or possibly ghettoized
Naively, I’d say the surprises are going to be better if you filter more different friends, rather than more controversial books among your friends. As in “find me a person that’s like me only in some ways, tell me what they love”. Long term this method is much better at exposing you to new ideas rather than just finding your cliques holy wars.
The "Intersect" page was useless for me. I added 15 books, but got no matching user. I entered a cycle of removing-searching, and at 10 books I had 2 users: one had read 41353 books, and the other 85363, with no ratings...
To be useful, the "Intersect" page should have:
- find near matches when there is no exact match with every book,
- ignore fake users (can any human read 80k books in many languages?),
- do not ignore users' votes (my input was books I liked, I expected to find users that rated them highly).
With the "Recommend" page I had the same problem as the GP, and all the recommendations were useless. To fix that, I think some features are needed:
- do not list books by authors from my list (I don't need recommendations for them),
- add a button for marking a suggested book as "disliked" (at the bare minimum, it should remove it from the suggestion, and ideally it should influence le suggestions as much as a "liked" book),
- do not suggest several books by the same author,
- add a button to hide a suggestion or show more suggestions (there were dozens of books I'd read but wouldn't rate high).
What do you think the probability that someone else read 15 books you also read is? It’s very unlikely unless they are all staples of a genre, part of the same series, or just extremely popular in general. 3-5 books is how much I would use on that page. I have found interesting accounts of medievalists, people who work at think tanks, etc with it.
Fake users I would agree should be filtered, but I don’t think filtering out users who gave it a bad review is necessarily the intended behavior. If I put in 3 semi obscure Russian history books, I am presumably looking for someone who is an expert in Russian history to see what else they read. In that case I don’t care if they didn’t like one of the books or not. Approximate matches would require something like LSH or cosine similarity of average input book embedding against average embedding of read books of every user which I think wouldn’t work well anyone for retrieving anyone with a moderately long interaction history.
1 reply →