← Back to context

Comment by PaulHoule

5 hours ago

My YOShInOn reader basically looks like this. It takes a few 1000 up/down judgements to make good content-based recs [1], a reader that does collaborative filtering probably learns faster.

[1] train a BERT+SVM classifer to predict my judgements, create 20 k-Means clusters to get some diversity, take the top N from each cluster, blend in a certain fraction of randoms to keep it honest.

The clusters are unsupervised and identify big interest areas such as programming, sports, climate change, advanced manufacturing, anime, without putting labels on the clusters -- the clusters do change from run to run but so what. If I really wanted a stable classification I would probably start with clusters, give them names, merge/split a little, and make a training set to supervised classifier to those classes.