Comment by ted_bunny

19 days ago

Has anyone analysed JE's writing style and looked for matches in archived 4chan posts or content from similar platforms? Same with Ghislaine, there should be enough data to identify them atp right? I don't buy the MaxwellHill claims for various reasons but it doesn't mean there's nothing to find.

There was a post on here about a project in stylometry that analyzed HN users comment history. The tool helped find accounts that had an extremely similar writing style to a given account. The site was soon removed due to privacy concerns but many users with multiple account attested to its accuracy

https://news.ycombinator.com/item?id=33755016

It turns out stylometry is actually a pretty well-developed field. It makes me wanna write an AI browser assistant that can take my comments and stylize them randomly to make it harder to use these sorts of forensics against me

  • >It makes me wanna write an AI browser assistant that can take my comments and stylize them randomly to make it harder to use these sorts of forensics against me

    The old trick years ago was to translate from English to different language and back (possibly repeating). I'd be curious how helpful it is against stylometry detection?

    • The old trick years ago was to translate from English to different language and back (possibly repeating). I'd be curious how helpful it is against stylometry detection?

      If you want to be grouped with foreigners who don't know English, it might work well, although word choices may still be distinctive enough to differentiate even when translated.

      2 replies →

  • On the one side it's a shame this tool was removed because it's very interesting, but on the other hand, the main use case would likely abuse and (cyber)stalking.

    That said, best to assume that the various government agencies have tools like this, and better - if you're trying to hide your identity online, don't just change users or go through VPNS/proxies/TOR but change your writing style too.

    (Also I'm convinced most VPNs/ proxies / TOR nodes / public access points are honeypots)

  • A while back the government claimed it had used stylometry to identify Satoshi Nakamoto.

  • I remember using one of these tools and it falsely identified some other account as being mine. Of course, I only have just this account.

Stylometry is extremely sophisticated even with simple n-gram analysis. There's a demo of this that can easily pick out who you are on HN just based on a few paragraphs of your own writing, based on N-gram analysis.

https://news.ycombinator.com/item?id=33755016

You can also unironically spot most types of AI writing this way. The approaches based on training another transformer to spot "AI generated" content are wrong.

  • > You can also unironically spot most types of AI writing this way.

    I have no idea if specialized tools can reliably detect AI writing but, as someone whose writing on forums like HN has been accused a couple of times of being AI, I can say that humans aren't very good at it. So far, my limited experience with being falsely accused is it seems to partly just be a bias against being a decent writer with a good vocabulary who sometimes writes longer posts.

    As for the reliability of specialized tools in detecting AI writing, I'm skeptical at a conceptual level because an LLM can be reinforcement trained with feedback from such a tool (RLTF instead of RLHF). While they may be somewhat reliable at the moment, it seems unlikely they'll stay that way.

    Unfortunately, since there are already companies marketing 'AI detectors' to academic institutions, they won't stop marketing them as their reliability continues to get worse. Which will probably result in an increasing shit show of false accusations against students.

    • > I can say that humans aren't very good at it

      You're assuming the people making accusations of posts being written by AI are from humans (which I agree are not good at making this determination). However, computers analyzing massive datasets are likely to be much better at it , and this can also be a Werewolf/Mafia/Killers-type situation where AI frequently accuses posters it believes are human, of being AI, to diminish the severity of accusations and blend in better.

      1 reply →

    • Well, humans might be great at detecting AI (few false negatives) but might falsely accuse humans more often (higher false positive rate). You might be among a set of humans being falsely accused a lot, but that's just proof that "heuristic stylometry" is consistent, it doesn't really say anything about the size of that set.

    • Thing is, people are on the lookout for obvious AI and I'm sure they have been successful a few times. But this is like confirmation bias, they will never know whether they saw / read something AI generated if they didn't clock it in the first place.

      I'm on Reddit too much and a few times there were memes or whatever that were later on pointed out to be AI. And that's the ones that had tells, more and more (and as price goes down / effort/expenditure increases) it will become harder to impossible to tell.

      And I have mixed feelings. I don't mind so much for memes, there's little difference between low-effort image editing and low-effort image generation IMO. There's the "advice" / "story" posts which for a long time now have been more of a creative writing effort than true stories, it's a race to the bottom already and AI will only try and accellerate it. But sometimes it's entertaining.

      But "fake news" is the dangerous one, and I'm disappointed that combating this seemed to be a passing fad now that the big tech companies and their leaders / shareholders have bent the knee to regimes that are very interested in spreading disinformation/propaganda to push their agenda under people's skins subtly. I'm surprised it's not more egregious tbh, but maybe it's because my internet bubbles are aligned with my own opinions/morals/etc at the moment.

  • Hacker News is one of the best places for this, because people write relatively long posts and generally try to have novel ideas. On 4chan, most posts are very short memey quips, so everybody's style is closer to each others than it is to their normal writing style.

  • Funnily this also implies that laundering your writing through an AI is a good way to defeat stylometry. You add in a strong enough signal, and hopefully smooth out the rest.

People always claimed this as a data leak vector but I've always been sceptical. Like just writing style and vocabulary is probably extremely shared among too many people to narrow it down much. (How people that you know could have written this reply?) The counter argument is that he had a very specific style in his mail so maybe this is a special case.

  • If you have a large enough set to test against and a specific person you are looking for, this is totally doable currently.

    • Not just a test set, but enough of a set to search through and compare against. Several pages of in-depth writing isn't anywhere near sufficient, even when limiting the search space to ~10k people.

  • this is a well-studied field (stylometry). when combining writing styles, vocabulary, posting times, etc. you absolutely can narrow it down to specific people.

    even when people deliberately try to feign some aspects (e.g. switching writing styles for different pseudonyms), they will almost always slip up and revert to their most comfortable style over time. which is great, because if they aren't also regularly changing pseudonyms (which are also subject to limited stylometry, so pseudonym creation should be somewhat randomized in name, location, etc.), you only need to catch them slipping once to get the whole history of that pseudonym (and potentially others, once that one is confirmed).

    • Stylometry is okay if you're trying to deanonymize a large enough sample text. A reddit account would be doable. But individual 4chan posts? You barely have enough content within the text limit.

The writing style is rather interesting. Epstein seems borderline dyslexic, but almost none of the emails I've seen are written in a coherent way, regardless of the sender.

Either people on that level rarely write anything on their own and have completely forgotten how to construct proper sentences or maybe that just how they communicate. Sort of language internal to the group.

  • I had a boss who was too impatient to hear a full sentence most of the time, and respected absolutely no one. She typed like this.

  • I haven't looked at the files, nor followed the technical analysis much, but in case you missed it, some of that incoherency may be a processing glitch discussed a couple of days ago.

    https://news.ycombinator.com/item?id=46868759

    • Yeah, I saw that and no, that's not what I mean. Some of the conversations reads like incoherent ramblings, completely devoid of context, answers that seems unrelated. Even when we have a "full" thread of conversation, it's really hard to parse the messages and make sense of them. It sometimes read like maybe they have their own language.

      Some people postes conversations, and comments, but I don't feel like they actually grasp what's being discused and they just latches on to key words.

> I don't buy the MaxwellHill claims for various reasons

Why not? Clear motive, matching timeline, mentions of that reddit account in the released FBI documents of her case

  • I was there for the original thread making the connection so I got a very fresh look at the profile. The user was consistently referring to being in dental school. A lot of posting, and not in ways that would influence opinions. Maybe a cover for more secretive mod actions, but it'd be a wastefully excessive cover.

    Other mods knew them personally and were still in contact. The user claims they heard of the rumor and decided not to reactivate for the lulz.

    I am not familiar with the mod side of reddit - couldn't fellow mods audit her mod action logs to find more juicy details we would have heard about by now?

    If Maxwell is indeed a spy and doing what she is claimed to do, it is highly unlikely that she'd put her last name and a reference to her specific family's property in her username. This would be a glaringly arrogant choice for someone who had been groomed from an early age for spycraft, and who had any degree of oversight.

    If she were part of a spy network, they would be highly remiss not to commandeer the account at the time of her arrest to avoid suspicion unless they were completely incompetent.

    I am mostly familiar with cold war espionage so it just doesn't sound like the general MO to me. Unless Opsec or whatever has badly decayed since then. That's not impossible.

    The mentions of the account in the files are from anonymous tips, some of which are highly absurd. They vetted a lot of tips, and I saw no information in the new releases indicating they thought it held water. We've seen the subpoena and IP tracking for the Epstein prison guard whistleblower, but no such thing on this topic.

I'm pretty sure Epstein tried to meet with moot at least once: https://www.jmail.world/search?q=chris+poole

  • He met with moot ("he is sensitive, be gentile", search on jmail), and within a few days the /pol/ board got created, starting a culture war in the US, leading to Trump getting elected president. Absolutely nuts.

    • Few thoughts: in context it's not nuts at all:

      - moot was fundraising for his VC backed startup during the years the emails are in, and he was likely connected via mutuals in USV or other firms. These meetings were clearly around him trying to solicit investment in his canv.as project.

      - /pol/ was /new/ being returned; the ethos of the board had already existed for a long time and the decision to undo the deletion of /new/ was entirely unsurprising for denizens at the time, and was consistent with a concerted push moot was making for more transparency in the enforcement of rules on the site and fairness towards users who followed the rules. /pol/ didn't start a culture war at this time any more than /new/ had previously - it just existed as a relatively content-unmoderated platform for people to discuss earnestly what would get them banned elsewhere.

      1 reply →

    • Given the "nature" of 4chan (only a few hundred posts and a few thousand comments at a time, the vast majority of it shitposts and spam), it just can't do that. The imageboard format and limits basically prevent any scaling and mainstream success. If you follow any of the general threads in pol or sp for a while, you'll spot the same few people all the time, it's a tiny community of active users.

      3 replies →

    • I always wondered how much of a cultural etc influence 4Chan actually had (has?) - so much of the mindset and vernacular that was popular there 10+ years ago is now completely mainstream.

      4 replies →

    • Just to substantiate this a bit: I remember a gleeful consensus in certain circles being that /pol/ and /r/the_donald had "memed Trump into the White House". It's much more complicated than that, but there's certainly an element of truth there.

      7 replies →

    • I don’t agree with this analysis.

      The reason I don’t agree is that moot banned any Gamergate discussion and those people then went to 8chan, a site which moot had no control over.

      And it was Gamergate that put some fuel on the fire which (IMHO) increased support for Trump. The 8chan site grew a great deal from it, then continued from that first initial “win”.

      1 reply →

    • What is the theory here ? that Epstein suggested the idea of breeding extreme counter culture on 4chan ?