Comment by 0cf8612b2e1e

7 hours ago

Under the Known Limitations section

  deleted and dead are integers. They are stored as 0/1 rather than booleans.

Is there a technical reason to do this? You have the type right there.

By "to do this" do you mean to not use booleans? It's because the value does not represent a binary true or false but rather a means by which the item is deleted or dead. So not only would it not make sense semantically, it would break if a third means were introduced.

  • > It's because the value does not represent a binary true or false but rather a means by which the item is deleted or dead.

    "Deleted" and "dead" are separate columns.

    > So not only would it not make sense semantically, it would break if a third means were introduced.

    If that was the intention, it would seem like a bad design decision to me. And actually what you assume to be the reasoning, is exactly what should be avoided. Which makes it a bad thing.

    This is a limitation not because of having the bool value be represented by an int (or rather "be presented as"), but because of the t y p e , being an integer.

  • Funny, because the HackerNews API [0] does return booleans for those fields. That is, a state, not a type of deletion or death.

    [0] https://github.com/HackerNews/API

    • The API documents this but from a spot check I'm not sure when you'd get a response with deleted: false. For non-deleted items the deleted: key is simply absent (null). I suppose the data model can assume this is a not-null field with a default value of false but that doesn't feel right to me. I might handle that case in cleaning but I wouldn't do it in the extract.

      3 replies →