Comment by costco
3 years ago
I think people are sort of confused at what this tool is supposed to be which I will concede is partially my fault. The results of this tool are by themselves not indicative of having an alternative account. It generates the 20 most similar users for every single user on the site, regardless of whether they have an alt or not (there's obviously no way for me to know that for every single user). In your case further investigation would reveal that none of those accounts are yours.
It is a fun tool, I can assure you. It is just people have found use case you haven't foreseen yourself.
I think your tool should have internal embeddings for each of the user. Also, most probably your tool uses cosine similarity for a search.
Thus, I would like to suggest a feature: recognize simple arithmetic operations over user's embeddings, such as "thesz - 2 * patio11". It will make things even more fun, this way we can find users who are like me and much not like patio11. Even simple additions and subtractions would suffice.
(an idea is taken from properties of word2vec embeddings)
Your tool is thought provoking. What I discovered with it made me think about my use of language and what other languages (body, imagery, etc) I use differently because of who I am. Which made me think about my favorite underrated superhero Cypher [1] - would his innate ability to understand languages make him best detective ever?
[1] https://en.wikipedia.org/wiki/Cypher_(Marvel_Comics)
Thank you!
Really cool idea. I'd need to upgrade the VPS though so all the vectors would fit in memory but it probably wouldn't be too hard (right now I'm just storing a map of username string -> array of 20 username strings because my VPS only has 512mb RAM). I'll think about if I can do this in a way that is more resource conservative.