← Back to context

Comment by ceejayoz

9 years ago

> Can someone give me some examples of cases where there is a strong semantic difference that you'd want to represent with a blank string vs a null?

A null middle name means "we don't know their middle name, we added it to the schema after that user signed up or didn't ask them". An empty string means they explicitly don't have a middle name.

I entirely agree with your point with regard to a fixed-schema, row-based, relational-like datastore. But in a KV-store like DynamoDB, there's a different set of possibilities:

[1] key present, value undefined or null or some other special type

[2] key present, value is empty object of its expected type

[3] key absent

In schemaless datastores where the value's content somehow determines its datatype, it can be difficult to enforce a distinction between scenarios [1] and [2]. Meanwhile, in an externally-schema'd datastore (like most RDBMS), you don't have option [3]. I am familiar with the practice of mapping "we don't have a known-good value for this" to omitting the key in an KV-store, while in an RDBMS that semantic meaning is mapped to a NULL instead.

  • Right. In Dynamo you could store JSON with "middleName: null" and know that means "We asked, they didn't have it" as opposed to a record without a middleName key which means "we never asked".

    • And now you've just discovered why relational databases have a "null" concept that is separate from the "empty string" concept.

      In a relational table, all rows in the table have a "middleName" column. So you can't "omit" the column entirely for some rows like you can in a doc/KV store.

      "NULL" in a relational DB is to store, in a table where every record has a "middleName" column, the concept of "missing key" from a key/value store type layout.

      NULL is the relational DB's way of representing: "a record without a middleName key" exists here.

Although I think the limitation is an incontrovertible issue in other ways, that specific example sounds like a domain modelling blunder to me, since it is overloading a scalar value to encode value-object metadata.

  • Overloading a scalar for metadata is my complaint with it.

    In what ways do you think the limitation is an incontrovertible issue?

    • Well... just thinking out loud:

      It's an impedance mismatch with every programming language I personally know, which all permit the empty string. Even Erlang or Haskell, with their oddball strings that are actually lists, permit an empty/null list for this purpose. What might they be? Maybe it's a blank line from an array of chomped input lines, maybe it's a base case of a recursive operation (see also: you can't have an empty set, possibly related, maybe DynamoDB stores strings as lists...).

      The point is, sometimes we have the empty string as a legitimate scalar value, so rejecting them creates extra work for the developer and feels like a POLA* violation, even if there's a solid underlying technical reason why it is so (and maybe there is, maybe it's due to a Merkle tree for the values, as suggested in the predecessor Dynamo's original paper). It's the kind of thing that leads people to develop wrappers for things that really shouldn't need them, which is a gateway drug into all sorts of unnecessary abstractions and extra work.

      At the other end of the spectrum I've seen some hilaribad JSON structures, ones where a value might be absent, or null, or the empty string, or a real string, and these four things all have different meanings /o\.

      * Principle of Least Astonishment.