Comment by albedoa
4 hours ago
By "to do this" do you mean to not use booleans? It's because the value does not represent a binary true or false but rather a means by which the item is deleted or dead. So not only would it not make sense semantically, it would break if a third means were introduced.
> It's because the value does not represent a binary true or false but rather a means by which the item is deleted or dead.
"Deleted" and "dead" are separate columns.
> So not only would it not make sense semantically, it would break if a third means were introduced.
If that was the intention, it would seem like a bad design decision to me. And actually what you assume to be the reasoning, is exactly what should be avoided. Which makes it a bad thing.
This is a limitation not because of having the bool value be represented by an int (or rather "be presented as"), but because of the t y p e , being an integer.
Funny, because the HackerNews API [0] does return booleans for those fields. That is, a state, not a type of deletion or death.
[0] https://github.com/HackerNews/API
The API documents this but from a spot check I'm not sure when you'd get a response with deleted: false. For non-deleted items the deleted: key is simply absent (null). I suppose the data model can assume this is a not-null field with a default value of false but that doesn't feel right to me. I might handle that case in cleaning but I wouldn't do it in the extract.
I am always torn on a nullable boolean. I have gone both ways (leave as null or convert to false) depending on what it is representing.
In this particular case, I agree that you should record the most raw form. Which would be a boolean column of trues and nulls -perfectly handled by parquet.
1 reply →
It’s because Arc by design can’t store nil as a value in tables, like Lua. And the value is either ‘t or nil. Hence it’s a boolean.
My fork of arc supports booleans directly.
In other words, I can guarantee beyond a shadow of a doubt that dead and deleted are both booleans, not integers.