Comment by svat
5 months ago
Try it here (you may have to create a Google Cloud project, but you don't have to enable billing or start the free trial):
https://news.ycombinator.com/item?id=41347188) below, but you can also use standard SQL if you prefer):
FROM `bigquery-public-data.hacker_news.full`
|> WHERE type = 'comment' AND timestamp < '2022-11-30'
|> AGGREGATE COUNT(*) AS total, COUNTIF(text LIKE '%—%') AS with_em GROUP BY `by`
|> EXTEND with_em / total AS fraction_with_em
|> ORDER BY fraction_with_em DESC
|> WHERE total > 100 AND fraction_with_em > 0.1
(I'm in place 47 of the 516 results, with 0.29 of my comments (258 of 875) having an em dash in them.)
Edit: As you also asked about timestamps:
FROM `bigquery-public-data.hacker_news.full`
|> WHERE type = 'comment' AND timestamp < '2022-11-30'
|> EXTEND text LIKE '%—%' AS has_em
|> AGGREGATE
COUNT(*) AS total,
COUNTIF(has_em) AS with_em,
MIN(timestamp) AS first_comment_timestamp,
MIN(IF(has_em, timestamp, NULL)) AS first_em_timestamp,
TIMESTAMP_SECONDS(CAST(AVG(time) AS INT64)) AS avg_comment_timestamp,
TIMESTAMP_SECONDS(CAST(AVG(IF(has_em, time, NULL)) AS INT64)) AS avg_em_timestamp,
GROUP BY `by`
|> EXTEND with_em / total AS fraction_with_em
|> ORDER BY fraction_with_em DESC
|> WHERE total > 100 AND fraction_with_em > 0.1
for most people the average timestamp is just the midpoint of when they started posting (with em dashes) and the cutoff date of 2022-11-30, and the top-place user zmgsabst stands out for having started only in late January 2022.
No comments yet
Contribute on Hacker News ↗