Different types of languages deal with this “value” in diverse ways. You can have a more comprehensive list of what NULL
can mean on this website. What I like to think about NULL
is along the lines of invalid, as if some sort of garbage is stored there. It doesn’t mean it’s empty, it’s just mean that something is there, and it has no value to you.
Databases deal when storing this type in a similar way, PostgreSQL treats it as “unknown” while MySQL treats it as “no data“.
Both databases recommend using \N
to represent NULL
values where import or exporting of data is necessary.
When to use it
You don’t. Particularly, I DON’T recommend using NULL
.
NULL
doesn’t mean empty
So if you want to represent lack of data or optional fields use a default value. It’s bad sign of architecture having NULLABLE fields, there is an extra case to test and to write for. It adds unnecessary complexity.
However, there is one case where I do think NULL
is acceptable. And that is when working with MySQL date related fields. I will talk more about this further down.
How to Query it
MySQL doesn’t recognize field = NULL
because, remember, NULL
means invalid, not empty. Thus using it will not return any rows.
As much as NULL
value will never be equal to another NULL
, when using ORDER BY
, GROUP BY
and DISTINCT
, the server interprets the values as equal. Aggregators functions such as MIN()
, SUM()
and COUNT()
ignore NULL
values, except for COUNT(*)
that counts rows, and not columns.
When using ORDER BY
a column is nullable the NULL
values appear first if instructed as ASC
and in the end if DESC
is requested.
PostgreSQL on the other hand has an option to convert equal comparisons expressions to field IS NULL
, if enabled (transform_num_equals).
The ordering for ORDER BY
depends on indexing of the field, by default NULL
comes first, but you can specify when creating an index where the NULL
values should be: top or bottom.
For aggregators functions, PostgreSQL works the same way.
Performance
Having NOT NULL
columns permits similar performance on MySQL as an column = 1
do. However that doesn’t happen in LEFT JOIN
operations while a field could be NULL
. But this is for the type of queries where IS NULL
is used:
# Assuming that `active` column is NOT NULL SELECT * FROM users WHERE active = 1 OR active IS NULL;
Summing up directly from MySQL documentation:
Declare columns to be
NOT NULL
if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value isNULL
. You also save some storage space, one bit per column. If you really needNULL
values in your tables, use them. Just avoid the default setting that allowsNULL
values in every column.
The COALESCE()
function
This function return the first non-null result of a column. Keep in mind to perform this operation on non-indexed columns. It is slow by its nature, just a friendly warning of when you are using to be mindful of its fallback.
The exception to the rule
I think this applies to MySQL databases. DATE/DATETIME
should not be allowed to be NULL
if, and only if the sql_mode
directive NO_ZERO_DATE
is disabled.
What does it mean? NO_ZERO_DATE
doesn’t allow for 0000-00-00
to be inserted in a DATE/DATETIME
. MySQL 5.7 sql_mode
insures some restrictions into the database by default. If for instance you have DATE
column that is NOT NULL
and doesn’t pass a value to it, 0000-00-00
will be saved, because the column is NOT NULL
, BUT will give a warning:
mysql> DESCRIBE users; +------------+------------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+---------+----------------+ | id | int(10) unsigned | NO | PRI | NULL | auto_increment | | name | varchar(45) | NO | | NULL | | | created_at | datetime | NO | | NULL | | +------------+------------------+------+-----+---------+----------------+ 3 rows in set (0.00 sec) mysql> INSERT INTO users (name) VALUES ("Gabi"); Query OK, 1 row affected, 1 warning (0.01 sec) mysql> SHOW WARNINGS; +---------+------+-------------------------------------------------+ | Level | Code | Message | +---------+------+-------------------------------------------------+ | Warning | 1364 | Field "created_at" doesn't have a default value | +---------+------+-------------------------------------------------+ 1 row in set (0.00 sec)
Making this operation to make sure no zero date will be allowed as it is by default in MySQL 5.7:
SET @@GLOBAL.sql_mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE;
When trying to insert a similar query:
mysql> INSERT INTO users (name) VALUES ("Blossom"); ERROR 1364 (HY000): Field "created_at" doesn't have a default value
To sum it up
If you have MySQL and NO_ZERO_DATE
is in your sql_mode
, you should ALWAYS use NOT NULL
. MySQL 5.7 brings the mode enabled by default among with other things, read more here.
If you don’t have it enabled for any other reason, then DATE/DATETIME
MAY be NULL
, because data integrity > performance in this case.
Again, 0000-00-00
IS NOT a valid date.
Another reason to not allow NULL is that is that is disambiguates non-linked IDs. Imagine that you have a parent-child relationship where each record can have a “parentID”. What does the root record have? parentID = NULL or parentID = 0 (assuming parentID is an INT). If the parentID is non-null, the answer is obvious – you don’t have to look at the code to see how the application is treating it. However, if the parentID column is nullable, you have to look at the application code to see how the team is treating it. And, you have to hope that the team is treating it consistently.
LikeLike
NULLs are used to represent “missing or inapplicable” information, and are a state rather than a value. They’re extremely useful when used appropriately.
Say you did a stock take on Monday, and counted 101 widgets. You did another stock take on Wednesday and counted 95 widgets. You should have done a stock take on Tuesday, but didn’t have time. What value to record for Tuesday’s count? There were definitely widgets there, so 0 isn’t true. The widget stock level has changed, so using 101 or 95 would be guesses. NULL is the correct choice to represent this – there was a value, but we just don’t know what it is.
LikeLike