Comment by MBCook

9 years ago

Can someone give me some examples of cases where there is a strong semantic difference that you'd want to represent with a blank string vs a null?

I'm interested in abstract concepts, not "I'm trying to port from Oracle/Mongo/whatever and this is how it works". Assume your designing against some generic record store.

I'm having a ton of trouble thinking of a case where that's a big problem (again, ignoring trying to save time in porting) and I was wondering if someone could provide me a few examples.

It's the difference between "this value is unknown" and "this value is known and empty". It's similar to why JavaScript has both null and undefined.

Here's an example. Let's say a user is taking a survey and the last question is: "Please share any additional thoughts if you have them." A null could mean the user did not answer this question, probably because they haven't gotten to it yet. An empty string could mean the user was asked this question and actually didn't have anything to say.

  • That's basically the only example I could think of: tracking if someone filled out a form field.

    But in that case how can you tell the difference? I mean if they never clicked in the field or tabbed into it and submitted the form is that meaningfully different from they clicked into it and didn't type anything?

    I can think of ways to work around it (keep a boolean for 'was filled out', a number that means 'they got up to field 17', etc.) but I'm still not sure you can draw a conclusion of 'they skipped it' vs 'they didn't enter anything'.

    • You're assuming all the questions are being presented to the user at once. Imagine a UI that presents questions to a user one-by-one. If the user bails out of the survey early, the rest of the survey answers can be null.

      6 replies →

  • To be fair, this would often be modeled by omitting the field completely. The lack of the field suggests the user hasn't gotten that far. The presence of the field with a NULL value means they didn't provide anything.

In a relational database there's a clear difference between "I did a join and there was no value on the other table" and, "this cell was intentionally left blank".

I haven't used DynamoDB enough to tell if that applies here or not.

  • You basically don't join in Dynamo (as I remember), so that doesn't fit.

    Either way, you'd still need to give a good reason for "this space intentionally left blank" because I'm not seeing much in the well of compelling reasons.

> Can someone give me some examples of cases where there is a strong semantic difference that you'd want to represent with a blank string vs a null?

A null middle name means "we don't know their middle name, we added it to the schema after that user signed up or didn't ask them". An empty string means they explicitly don't have a middle name.

  • I entirely agree with your point with regard to a fixed-schema, row-based, relational-like datastore. But in a KV-store like DynamoDB, there's a different set of possibilities:

    [1] key present, value undefined or null or some other special type

    [2] key present, value is empty object of its expected type

    [3] key absent

    In schemaless datastores where the value's content somehow determines its datatype, it can be difficult to enforce a distinction between scenarios [1] and [2]. Meanwhile, in an externally-schema'd datastore (like most RDBMS), you don't have option [3]. I am familiar with the practice of mapping "we don't have a known-good value for this" to omitting the key in an KV-store, while in an RDBMS that semantic meaning is mapped to a NULL instead.

    • Right. In Dynamo you could store JSON with "middleName: null" and know that means "We asked, they didn't have it" as opposed to a record without a middleName key which means "we never asked".

      3 replies →

  • Although I think the limitation is an incontrovertible issue in other ways, that specific example sounds like a domain modelling blunder to me, since it is overloading a scalar value to encode value-object metadata.

    • Overloading a scalar for metadata is my complaint with it.

      In what ways do you think the limitation is an incontrovertible issue?

      1 reply →

A few thoughts.

0 <> NULL <> ""

They are all values that matter because they can exist.

Blank can have business meaning. Also, I suspect that there is a reason blank is separate from NULL in the ASCII table. They are different signals, and changing one to the other is fundamentally changing the meaning of the signal.

Blank often has business meaning on forms, like "leave this field blank to indicate the user has acknowledged the field but chose to enter a blank value" vs the "user never saw this field and no value was entered, or the system did not store a value for the user's input".

The old saying "NULL is not nothing"?

You could represent a path through a grid as a string using U=up, R=right, D=down, L=left. Then a solution to a maze could be 'UUUURRRRLRUULLD'. An empty string indicates a solution to a maze in 0 zero steps. A null string indicates that there is no solution.

  • I'll give you that, but it's a very contrived example. I'm trying to think more of real world problems.

    Many people have mentioned the idea of keeping track of if someone doesn't have a middle name versus you don't know their middle name. That makes more sense as an example (although I'm not sure how critical that is).

    • Treating the empty string as NULL, as ORACLE does, is the sane option on real world problems. Having "" different from NULL creates a lot of problems and the advantages are minimal. As you say, in the rare cases that you must tell WHY it is null, it is better to have a separated Boolean field, than to store "" vs NULL and give a meaning to each one.

      One of the first problems that you'll encounter if you use a DB with "" != NULL is at data display, because the natural display for NULL and "" are "an empty cell". Then you cannot tell if it is NULL or ""... then you must display something else for NULL... then...

      You have no such problem if "" is NULL, an empty cell always means NULL

      3 replies →

  • I think this is a bad use of `NULL`.

    You should instead use a list of all possible solutions, or a sublist of that list (if you only need one solution) because then the empty list can indicate there are no solutions.

    If you then want to do something you can iterate over the list: This will protect you from "null pointer" exceptions, promote more functional programming, and is usually less code as well.

        js: if(x !== null) { … }
        q: $[not null x;…;'`oops]
    

    versus:

        js: x.map((r) => …)
        q: {…} each x