← Back to context

Comment by soulofmischief

1 day ago

Yeah, it's been on my mind since the original transformers paper, because persona and identity management has been a long-time interest of mine. Both offensive and defensive tooling, such as fingerprinting and anti-fingerprinting.

If you're interested in building an open source persona management suite to distribute as freedom software and level the playing field against State agents who are already building and improving such tools, I would love to find a partner to help with such a project. Even if you don't code, there are other duties besides coding involved with successfully promoting such a project and developing a community around it.

Yes I work in cyber and I've always thought about the ability for fingerprinting people by their content. Not really semantically (that was just not really possible until LLMs came up) but more in terms of interests on social media.

But semantic analysis adds a whole new level with so much entropy that it's bound to be unique. And LLMs are just ideal for pattern recognition. There's not much we can do about that as a human, trying to fool it won't work. It really needs an artificial sanitiser. One that really builds a persona and aligns to it deeply (like little colloquialisms from the purported origin of the persona).

And also things like comment posting hours. I have identified several accounts from people who said they were chatting with me and I could prove they were doing something completely different at that same time. Us humans aren't consistent enough for that. Especially if you have multiple sockpuppets.

I don't think I could help much with that though. I'm neither a developer nor a promotor, I'm too much of an introvert for that. But it sounds really interesting.

But yeah I'm sure that within 5 years, if you are still writing comments yourself, it won't matter whether they know your phone number or email address, you will be uniquely identified by just what you write. I wouldn't be surprised if the darker forces in society have this capability already.

  • There was a thread not too long ago where someone did stylometric analysis on HN, and quite a few users had true positive matches (though there were plenty false positives).

    They later pulled the dataset, but antirez recreated it.

    https://antirez.com/hnstyle

    but your account is too young to be in the dataset. You could ask an LLM to recrunch the numbers with a newer dataset, though.

    • > but your account is too young to be in the dataset.

      This is on purpose yes. I've taken to rotating my accounts everywhere on a semi-regular basis as a feeble low-effort mitigation that I'm sure will not hold once this gets into full blown deployment.

      I know the HN community frowns on that but it's not like I rotate every day. Probably should for it to be effective though.

      Thanks for the link I will try that out! I missed that happening.