Comment by yorwba
3 days ago
If you use k-means to cluster your data into 100 clusters, it will do so, irrespective of whether it is meaningful to do so. Perfectly objective, but what does that objectivity buy you? If your pet theory is that there are 100 groups, you'll be actually less likely to get results that disagree with that than if you ask an LLM how many groups there are.
But the real question is not whether you agree with the results, but whether they're useful. If you apply an objective method to data it is unsuitable for, it's garbage in, objective garbage out. Whether the method is suitable or not is not always something you can decide a priori, then you need to check.
And if trying it out shows that LLM-provided clusters are more useful than other methods, you should swallow your pride and accept that, even if you disagree on philosophical grounds. (Or it might show that the LLM has no idea what it's doing! Then you can feel good about yourself.)
This is a very interesting conversation. Correlates well with the responses I got from the colleague during the meeting. Would you ask ChatGPT to do a t-test for you and blindly accept its results as well, regardless of whether the math behind it was sound or not? The reason why we use math and statistics in experimental research is because we want objective results, not simply results that correlate with our expectations (that we can get from watching YouTube or reading blogs). The objectivity of K-Means buys me the trust that whatever clusters I get have been obtained with a well-know and understood method, in which my expectations have absolute no influence. Also, I know that the next person will get similar results, which also gives me trust in their results. So we can all have a shared, independent, objective understanding of a piece of data.
I wonder, if well-educated and technically-literate people like him and you are willing to accept arbitrary results from a language model as a replacement for objective math, then what should we expect from the general public?