Comment by lblume

2 days ago

> But a field that is nullable in the schema and never null in practice is a silent lie.

This seems to be the central claim. But just as lies require intent, so does database design to some degree.

A column that is nullable but never null does not conclusively say anything, really. That's like saying a birthday column that is never before 1970 in the current data should be restricted to years after that date.

A nullable column signals that the data may be left empty, which is entirely different from that column actually having to be empty with the current data. Is-ought distinction, the potentiality of null ("nullability") is not to be confused with the actuality of current vacancy ("null values"). The programmer extracting data from the database must check for null if the data is nullable, not just if is null now as a principle of robust fault-tolerant code.

I don't think the author is talking generally about fields that could be NULL but just happen to never be so in the production DB. The piece is specifically in the context of a new database field that is fully intended by its designer to be NOT NULL, which was NULL only for migration purposes, and which was never updated to be NOT NULL once the migration is complete. The point was not meant to be extended beyond that.

One could write a separate piece about maybe using that as a clue that the field could be NOT NULL'd in the future but that's not what this post is.

  • In that case we read it entirely differently — the last paragraph explicitly mentions inferring "Nullable fields that never hold nulls are often the result of incomplete migrations or forgotten cleanup steps". The author also proposes a script "to identify cases where the field could safely be changed to non-nullable". But alas, we cannot do that with full generality, due to there being a big difference in intent that cannot be captured purely by counting nulls, and surely not by only calculating percentages.

> That's like saying a birthday column that is never before 1970 in the current data should be restricted to years after that date.

No it's not, because they specifically started with the premise that the field was initially intentionally non-null and was only temporarily made nullable for migration purposes. That is obviously not the situation you are describing, right?

I think their point is that for all intents, the column IS not nullable. It's nullable as an artifact of making live schema migration easier, with no blocking/downtime. But as the data model is concerned, it should not be nullable.

  • Sure, if one just leaves a column nullable due to negligence, one should check the actual requirements and enforce them to make invalid states unrepresentable. The author still makes too strong of a claim that becomes detached from the migration aspect, insinuating that one can just empirically test the database to check whether this is the case, to which I disagree.

Problem is you end up other places with the assumption thar it's never null. So in the future when you actually set it to null somewhere it will blow up.

  • People that assume that a (nullable) value is never null because the sample at hand does not contain a null value _will_ learn their lesson the hard way, the one you describe.

    But even non-nullable does not always resist to time, I'd argue that use cases where the field _has_ to be null eventually emerges and somehow have to be mitigated. There is no easy solution to safely workaround that without either tons work that duplicates lots of things or taking risks by adapting the code base.