Comment by ceejayoz

9 years ago

> Can someone give me some examples of cases where there is a strong semantic difference that you'd want to represent with a blank string vs a null?

A null middle name means "we don't know their middle name, we added it to the schema after that user signed up or didn't ask them". An empty string means they explicitly don't have a middle name.

9 comments

ceejayoz

niftich 9 years ago

I entirely agree with your point with regard to a fixed-schema, row-based, relational-like datastore. But in a KV-store like DynamoDB, there's a different set of possibilities:

[1] key present, value undefined or null or some other special type

[2] key present, value is empty object of its expected type

[3] key absent

In schemaless datastores where the value's content somehow determines its datatype, it can be difficult to enforce a distinction between scenarios [1] and [2]. Meanwhile, in an externally-schema'd datastore (like most RDBMS), you don't have option [3]. I am familiar with the practice of mapping "we don't have a known-good value for this" to omitting the key in an KV-store, while in an RDBMS that semantic meaning is mapped to a NULL instead.

MBCook 9 years ago
Right. In Dynamo you could store JSON with "middleName: null" and know that means "We asked, they didn't have it" as opposed to a record without a middleName key which means "we never asked".
- pwg 9 years ago
  
  And now you've just discovered why relational databases have a "null" concept that is separate from the "empty string" concept.
  In a relational table, all rows in the table have a "middleName" column. So you can't "omit" the column entirely for some rows like you can in a doc/KV store.
  "NULL" in a relational DB is to store, in a table where every record has a "middleName" column, the concept of "missing key" from a key/value store type layout.
  NULL is the relational DB's way of representing: "a record without a middleName key" exists here.
- cdevs 9 years ago
  
  But my middle name IS null
  
  1 reply →
petemill 9 years ago

Your answer is entirely correct. This should be the top comment.

inopinatus 9 years ago

Although I think the limitation is an incontrovertible issue in other ways, that specific example sounds like a domain modelling blunder to me, since it is overloading a scalar value to encode value-object metadata.

MBCook 9 years ago
Overloading a scalar for metadata is my complaint with it.
In what ways do you think the limitation is an incontrovertible issue?
- inopinatus 9 years ago
  
  Well... just thinking out loud:
  It's an impedance mismatch with every programming language I personally know, which all permit the empty string. Even Erlang or Haskell, with their oddball strings that are actually lists, permit an empty/null list for this purpose. What might they be? Maybe it's a blank line from an array of chomped input lines, maybe it's a base case of a recursive operation (see also: you can't have an empty set, possibly related, maybe DynamoDB stores strings as lists...).
  The point is, sometimes we have the empty string as a legitimate scalar value, so rejecting them creates extra work for the developer and feels like a POLA* violation, even if there's a solid underlying technical reason why it is so (and maybe there is, maybe it's due to a Merkle tree for the values, as suggested in the predecessor Dynamo's original paper). It's the kind of thing that leads people to develop wrappers for things that really shouldn't need them, which is a gateway drug into all sorts of unnecessary abstractions and extra work.
  At the other end of the spectrum I've seen some hilaribad JSON structures, ones where a value might be absent, or null, or the empty string, or a real string, and these four things all have different meanings /o\.
  * Principle of Least Astonishment.