← Back to context

Comment by coldtea

6 years ago

>I am sorry that your experience was so terrible, but the plural of anecdote is not data.

Actually it is. Data is just many individual anecdotes collected. They just need interpretation.

No, it really isn't.

Having a representative sample is essential to being able to do statistics. And collecting self-reported anecdotes does not constitute a valid sampling technique. It doesn't matter how your massage your observations afterwards, GIGO still holds and what you received was statistical garbage.

  • >Having a representative sample is essential to being able to do statistics.

    That's orthogonal. Whether the collection is representative or not individual data elements are still anecdotes.

    (Plus, not all data is used for statistics, nor do we always have an advance knowledge of what is representative -- e.g. when researching an unknown domain).

  • Submitted data is still data, just at worst biased. Which may or may not be important.

    The question is always what bias and whether collecting much less data yourself is preferable. Your non-submission sampling tactic may be biased too. (E.g. telephone questionnaires select for people having free time on demand. Emails select for people with bad spam filters and present in mailing list. Walking to ask has other limitations such as range and again availability. Asking third parties may be biased too, just like asking first parties.)

    Usually when there are lots of unique submissions the question of bias or lack of representation can be put to rest.

    If e.g. there are racial biases compared to baseline population due to submissions, this can be taken into account. Likewise if there is his due to some school districts responding less or more. You will have to handle these issues anyway.

    If you guess what the representative sample might be, you may be committing scientific fraud...

  • You shouldn't blindly perform standard statistical analysis on such data with the usual techniques. But to declare it statistical garbage is simply going too far.

    In fact, anecdotes are the way in which we are able to make sense of the world at all. We do not as individuals do most of our learning via explicit statistical analysis.

    Your essential point, however, stands in the sense that one should certainly not act as if anecdotes are statistically unbiased. And your average person is terrible at proper statistical reasoning. People tend to over-emphasize their own experience. (Though this is evolutionarily and historically useful - a feature, not a bug). So, yes, someone presenting their own anecdote or set of anecdotes as data is often misguided.

    But there are, in fact, many studies (academic or industrial), that, are, in fact, just that! Collections of anecdotes. Self-reported experiences via surveys, error reports, reviews, and the like, which can be mined for data or looked at to see if there are any patterns.

    There are issues such as "WEIRD" (Western, Educated, Industrialized, Rich, and Democratic) samples. https://www.cambridge.org/core/journals/behavioral-and-brain... There are three basic approaches: Ignore the problem and use the results as-is. Declare the sample hopelessly biased and throw it out altogether until you can find a more representative sample. Or acknowledge the bias in the sample, but continue to use it along with careful annotations as a low-confidence best-guess until better data come along. The last is the obvious best approach in an ideal world, though biases such as motivated reasoning and poor reporting by the media often means that reporting such partial results can do more harm than good.

    For a more mathematically grounded approach, you could apply Bayesian reasoning: take into account your priors (including the best guess of your expected bias in your sample compared to the distribution from which you are sampling), and figure out exactly how much evidence each anecdote constitutes. It might not be much, but it's something.

    I'll close by mentioning that the quote is actually a misquote: http://blog.danwin.com/don-t-forget-the-plural-of-anecdote-i...

    “I said ‘The plural of anecdote is data’ some time in the 1969-70 academic year while teaching a graduate seminar at Stanford. The occasion was a student’s dismissal of a simple factual statement–by another student or me–as a mere anecdote." - Wolfinger