Comment by marginalia_nu

13 hours ago

Fwiw I did some more comparisons, looking for words disproportionately favored by noob comments:

    word   noob new   p-value
    ----------------------------
    ai 14.93% 7.87% p=0.00016
    actually 12.53% 5.34% p=1.1e-05
    code 11.47% 6.04% p=0.00081
    real 10.93% 2.95% p=2.6e-08
    built 10.93% 2.11% p=2.1e-10
    data 8.93% 3.51% p=6.1e-05
    tools 7.6% 2.67% p=5.5e-05
    agent 7.47% 2.95% p=0.00024
    app 7.2% 3.09% p=0.00078
    tool 6.8% 1.83% p=8.5e-06
    model 6.8% 2.39% p=0.00013
    agents 6.67% 2.11% p=5.2e-05
    api 6.53% 1.12% p=2.7e-07
    building 6.13% 1.54% p=1.3e-05
    full 6.0% 1.97% p=0.00017
    across 5.87% 1.4% p=1.3e-05
    interesting 5.33% 1.54% p=0.00014
    answer 5.2% 1.4% p=9.6e-05
    simple 4.93% 1.54% p=0.00043
    project 4.8% 1.26% p=0.00015

Worth pointing out that calculating p-values on a wide set of metrics and selecting for those under $threshold (called p-hacking) is not statistically sound - who cares, we are not an academic journal, but a pill of knowledge.

The idea is, since data has a ~1/20 chance of having a p < 0.05, you are bound to get false positives. In academia it's definitely not something you'd do, but I think here it's fine.

@OP have you considered calculating Cohen's effect size? p only tells us that, given the magnitude of the differences and the number of samples, we are "pretty sure" the difference is real. Cohen's `d` tells us how big the difference is on a "standard" scale.

Actually building full, real AI app project code across simple API data tools helps built model agents answer an interesting tool — an agent.

It's funny - some months ago I noticed that I use the word "actually" lot, and started trying to curb it from my writing. Not for any AI-related reason, but because it is almost always a meaningless filler word, and I find that being concise helps get my points across more clearly.

e.g. "The body of the template is parsed, but not actually type-checked until the template is used." -> "but not typechecked until the template is used." The word "actually" here has a pleasant academic tone, but adds no meaning.

  • I try to curb my usage of 'actually' too. Like you I came to think of it as an indirect, fluffy discourse marker that should be replaced with more direct language.

    I'm totally fine with the word itself, but not with overuse of it or placing it where it clearly doesn't belong. And I did that a lot, I think. I suspect if you reviewed my HN comments, it's littered with 'actually' a ton. Also "I think...", "I feel like..." and other kind of... Passive, redundant, unnecessary noise.

    Like, no kidding I think the thing I'm expressing. Why state that?

    Another problem with "actually" is that it can seem condescending or unnecessarily contradictory. While I'm often trying to fluff up prose to soften disagreement (not a great habit), I'm inadvertently making it seem more off-putting than direct yet kind statements would. It can seem to attempt to shift authority to the speaker, if somewhat implicitly. Rather than stating that you disagree along with what you believe or adding information to discourse, you're suggesting that what you're saying somehow deviates from what the person you're speaking to would otherwise believe or expect. That's kind of weird to do, in my opinion. I'm very guilty of it, though I never had the intent of coming across this way.

    It can also seem kind of re-directive or evasive at times, like you don't want to get to the point, or you want to avoid the cost of disagreement. It's often used to hedge statements that shouldn't be hedged. This is mainly what led me to realize I should use it less. I hedge just about everything I say rather than simply state it and own it. When you're a hedger and you embed the odd 'actually' in there, you get a weird mix of evasive or contradictory hedging going on. That's poor and indirect communication.

    • Like, no kidding I think the thing I'm expressing. Why state that?

      One reason might be to acknowledge that you're not being prescriptive, but leaving room for a subjective POV in situations that call for it.

      Likewise, the GP's use of "actually" acknowledges the contrast between what one might expect (that some preliminary type-checking might happen during initial parsing) and what in fact happens (no type checks occur until the template is used.) It doesn't seem out of line in that case.

      1 reply →

    • > Like, no kidding I think the thing I'm expressing. Why state that?

      I agree but it's not always clear whether you're stating an opinion or attempting to state a fact. Some folks would reply to a comment like this with "citation needed" but wouldn't otherwise have said that if the comment had opened with "I think."

  • Actually, this specific example usage of "actually" could have a meaning. It depends.

    "The body of the template is parsed, but, contrary to popular belief, not actually type-checked until the template is used."

    One can omit the "contrary to popular belief", but the "actually" would still need to stay, as it hints at the "contrary to popular belief".

    It's not as simple as "it's not needed there".

    The lack of recognition of perceived Noise as an actual part of the Signal, eventually destroys the Signal.

  • I find various verbal tics come and go in my speech and writing over time.

    Lately "I mean" has been jumping out at me.

    It really only bothers me when I notice I've used it for multiple comments in the same thread or, worse, multiple times in the same comment.

    • I used to use honestly quite a bit and then noticed how unnecessary it was (does it ever improve a sentence?) and how overused it is on Reddit.

      I've also pretty much dropped just from my vocabulary when I'm talking about an alternative way to do something.

Thank you marginalia_nu for article and this comment (word stats).

I got similar feeling. I'm new here, but got a feeling that some comments are like bot generated.

Such low p-values are proof that something is going on.

Hipotesis (after your recent word statistics): that some bots are "bumping up" AI related subjects. Maybe some companies using LLM tools want to promote some their products ;)

marginalia_nu respect for your work :)

Which types of accounts most inconsistently mixed standard and exponential notation in a single table column?

You've built an interesting statistic from gathering data across the project. The real answer: ai models and agentic apps make building spam tools more simple than ever. All you actually need is just some trivial api automation code.

  • I bet every single AI-startup dude who does it thinks they've stumbled on a brilliant, original, gold-mine of an idea to use AI to shill their product/service on internet forums, or to astroturf against "AI Haters".

  • Well done.

    Do all the models have this style of talking? Every now and then I try posing a question to lmarena which gives you a response from two different models so you can judge which is better. I feel like transitions like "The real answer...", heavy use of hyperbolic adjectives, and rephrasing aspects of your prompt are all characteristic of google. Most other models are much more to the point

Having mixed feelings on word "actually" as it is/was one of my favorites. Other stuff like "for instance" and "interestingly" are seem to be getting there too...

There are a couple of extra steps to be made to get to the root of it (openclaw)

How many new accounts are submitting github links as their first post?

How many new accounts include a first comment that is copied from the other side of the link?

Look at the timings between first commit, last commit, and account creation. Many happen in quick succession and in this order. Fastest I've seen so far is 25m from first commit to first post on HN, with account creation in between.

Can you articulate on the column meanings more? Noob new means nothing to me.

  • Maybe that means you're a net newbie (noobie, noob).

    noob = new user

    new = I think this might be a mistake? Surely noob should be compared to olds

    p-value = a statistical measure of confidence. In academic science a value < 0.05 is considered "statistically significant".

  • it's in the original article. New comments are any new comment from any account. Noob comments are new comments from new accounts

Such data analysis of HN related things are always so fun to read. Thanks for making this!

I have a quick question but can you please tell me by what's the age of "new" accounts in your analysis?

Because, I have been called AI sometimes and that's because of the "age" of my comments sometimes (and I reasonably crash afterwards) but for context, I joined in 2024.

It's 2026 now, Almost gonna be 2 years. So would my account be considered new within your data or not?

Another minor point but "actually"/"real" seems to me have risen in usage over 5 times. All of these words look like the words which would be used to defend AI, I am almost certain that I saw the sentence "Actually, AI hype is real and so on.." definitely once, maybe even more than once.

Now for the word real, I can't say this for certain and please take it with a grain of salt but we gen-z love saying this and I am certain that I have seen comments on reddit which just say "real" and OpenAI/other models definitely treat reddit-data as some sort of gold for what its worth so much so that they have special arrangements with reddit.

So to me, it seems that the data has been poised with "real". I haven't really observed this phenomenon but I will try to take a close look if chatgpt is more likely to say "real" or not.

Fwiw, I asked Chatgpt to "defend the position, AI hype sucks" and it responded with the word "real"/"reality" in total 3 times.

(another side fact but real is so used in Gen-z I personally watch channel shorts sometimes https://www.youtube.com/@litteralyme0/shorts which has thousands of videos atp whose title is only "real", this channel is sort of meme of "ryan gosling literally me" and has its own niche lore with metroman lol)