2004-09-26

Darwen and Date on nulls

It's pretty common to hear arguments against null values in databases, but it's less common in my experience to hear about ways of getting rid of them. The trouble being that people do use nulls, and they use them to mean a variety of things, often a variety of things in the same context (such as both "unknown salary" and "no salary").

My argument against them would be that they're a bad smell because tests against null aren't intention-revealing.

A recent post on jikes-dev mentioned The Third Manifesto by Hugh Darwen and C.J. Date. I'd never heard of Darwen, but then everything you learn at university is 15 years out of date. (Despite the subject matter, and solely thanks to the brilliance of the lecturer in question, Derek Bridge, the database course at York was one of the best I took. If you're in the market for a university, you might want to try his. I learned things answering his exam questions, which isn't something I can say for many exams.)

Anyway, Darwen wrote a presentation How to Handle Missing Information without Using Nulls, and also a defense of the criticism that having a separate table for each meaning of null hides information in table names. The latter ends:

[We] don't propose to have a separate table for each meaning of Nulls. There's no such thing as a Null. We cannot entertain the idea of having a table for any meaning of something that doesn't exist! We would rather characterise [sic] our design (a trifle loosely) as involving a relvar for every distinct kind of statement [predicate] that we wish to be represented in the database.

So, to paraphrase their loose characterization, they insist on intention-revealing replacements for the hacks people are committing with nulls.

This brings to mind the most common argument against item 27 of "Effective Java", the item about returning a zero-length array instead of null, to avoid special-case code in the caller. The argument goes that there's a distinction between an empty array and null, something along the lines of the difference between an environment variable set to the empty string and an unset environment variable. Or, if you prefer, the difference between having a basket containing no apples and having no basket.

The counter-argument is exactly that made by Darwen and Date: that there's an extra predicate to be represented. In the case in the example above, that we should have boolean Environment.contains(String name) alongside String Environment.get(String name).

An alternative implementation is to return a maybe type (data Maybe a = Just a | Nothing), or – eschewing generics as one should – a more intention-revealing equivalent.

But somehow these are never very popular, because the code looks the same as it would if we just returned null: in each case there's the method invocation, the test, and the block to handle the failure case. The main difference is that these alternatives are slightly less idiomatic.

Sometimes, as is probably true for the environment variables case, we can get away with a getter method that takes a default value to return when there's no better answer. Only in languages like Ruby do we have a really convincing alternative: that of passing a block to handle the failure case directly.

You could throw an exception, but nobody likes exceptions. They're like return codes, only much messier to deal with because they make you pay (syntactically) even in the normal case. (This is partly tounge-in-cheek; there are cases where unchecked exceptions are the best solution, such as iterations that can fail, but this isn't one of them.)

And that's all I have to say. No simple answer like the database guys give you. Just a bunch of choices, none of which I'd suggest for all applications.