Comment by pmelendez

13 years ago

> "If there is one thing I wish I could do to improve HN it would be to detect this sort of middlebrow dismissal algorithmically."

Do you have some data that we could use to play with? That sounds like a nice problem to solve.

15 comments

pmelendez

pg 13 years ago

You already have the data I'd use: the text of the comments.

If anyone wants to try to train a filter to detect this sort of comment, I'd be very interested to see the result.

sakai 13 years ago
I'll consider that a challenge...
To any that have experience getting comments data from HN -- what's the fastest, most polite way to do this? And am I correct in remembering that there's some aggressive rate-limiting for crawling the site?
- pg 13 years ago
  
  Don't crawl the site, please. The place to get data is the HNSearch API.
- ippisl 13 years ago
  
  There's a database(quite old) of HN posts and comments here:
  http://www.btscene.eu/details/2240774/Hacker+News+Database+o...
- malandrew 13 years ago
  
  Any plans to try to get a dataset for supervised ml? Perhaps collect the top 4 comments from all front page threads and post a survey on HN asking HNers to rate those comments for skepticism/dismissiveness?
derefr 13 years ago

I might bet that upvotes to middlebrow dismissal would be highly correlated with downvotes to the article itself, if we had downvotes to articles. In fact, I believe that these comments rise to the top because people are downvoting the article vicariously, by upvoting a rebuttal, however vacuous. In order to confirm this hypothesis, though, you'd have to collect data by implementing a downvote button on articles -- though it would not necessarily have to do anything in article-ranking terms ;)
(That might introduce a confounding factor, though--namely that by alleviating people's urge to downvote the article by giving them a [nonfunctional] button to do just that, people might stop upvoting the dismissive comments. Hmm....)
hntester123 13 years ago
>You already have the data I'd use: the text of the comments.
Wouldn't that require real AI though? I thought for a minute that NLP (Natural Language Processing, not the other meaning(s) of the acronym) might help, but then thought that it may not work for cases where the comment is quoting another comment. Note: I'm not at all an expert in any of those fields, just interested.
- adimitrov 13 years ago
  
  Sounds like a job for Sentiment Analysis [1]. Modern systems are pretty good at discerning negative from positive comments.
  You could probably find a way to mark negative and positive comments. Whether the resulting algorithm would be fine-grained enough to semi-reliably mark 'middlebrow dismissal,' I really don't know. Actually, as somebody who has worked on that stuff in the past, I don't think it would be very easy.
  [1] http://en.wikipedia.org/wiki/Sentiment_analysis
  
  6 replies →
heed 13 years ago

It might be easier to investigate submissions first and filter at this level if a trend is discovered. That is, a certain type of submission might attract a certain attitude of comment.