← Back to context

Comment by frenchmajesty

4 months ago

OP here. I agree! I should've called out why I did _not_ follow that approach as many others have commented the same.

The main reason why is that I needed the classification to be ongoing. My system pulled over thousands of tweets per day and they all needed to be classified as they came for some downstream tasks.

Thus, I couldn't embed all tweets, then cluster, then ...

Do the labels need to be static once the system has started? If not would be interesting to relabel embedding clusters once each hits a certain critical mass of tweets, or do so somewhat continuously.