← Back to context

Comment by costco

3 years ago

I'm not that smart - my site is basically just doing some calculations on word frequencies. You can read https://news.ycombinator.com/item?id=33755898 for more information.

As you mention on the site, you don't do punctuation. But I'm guessing there are some pretty good fingerprints like:

two spaces after a period

Whether someone uses an em-dash/single hyphen/double hyphens (which may correspond to house style they're used to)

Whether they use semi-colons

(Presumably harder) but consistent substitutions like loose for lose, break for brake, etc.

Use of accents

  • I manually determined there was an individual posing as two people (playing both the antagonist and the adversary) because they consistently misspelt certain words such as "definitely" as "defiantly".

    Fingerprinting certain linguistic traits and mapping that to time-zones as well as confirming there is a partial overlap in posts but never exact worked exceedingly well. Someone can't easily maintain a fluent conversation between themself on two accounts, but they can either get close, either through unnatural delays between sentences or just never interacting with the "other" party at the same time.

Simplicity is the greatest form of sophistication! Great work!

One small nit from a user experience point of view..: it'd be easier on the eyes if you just truncated those cosine similarity scores (or whatever score you're using) after the, say, 5th digit. Showing the entire float is kinda messy to my eyes.

Don’t sell yourself short. Simplicity is smart. It’s astonishing how often the simplest thing turns out to be exponentially more effective than the so-called smart thing.

I can’t get over how phenomenal this is. Please put every one of your side project ideas into production!

cool and thanks for the clarification. i ask that mainly because of the request limit of openai, which is something that makes many scalable ideas unfeasible