Comment by frenchmajesty

4 months ago

OP here. I agree! I should've called out why I did _not_ follow that approach as many others have commented the same.

The main reason why is that I needed the classification to be ongoing. My system pulled over thousands of tweets per day and they all needed to be classified as they came for some downstream tasks.

Thus, I couldn't embed all tweets, then cluster, then ...

2 comments

frenchmajesty

bungalowmunch 4 months ago

Do the labels need to be static once the system has started? If not would be interesting to relabel embedding clusters once each hits a certain critical mass of tweets, or do so somewhat continuously.

pietz 4 months ago

Makes sense, I appreciate the comment. Well written article. Subscribed.