Tag: postgres

Your SQL IS NOT JavaScript (neither PHP)

IS NULL, IS NOT NULL, truthiness among other assertions on MySQL and PostgreSQL

People like to complain about JavaScript, how can one thing be equal to the other, i.e. null == undefined evaluates to true unless you use the triple equals ===. Other dynamic typed languages have its peculiarities, such as Ruby where 0 == true evaluates to true, the reason for Ruby is that considers 0 as a value and any value evaluates to true.

How about SQL? The answer is… it depends.

Which database are you using?

Some may be more forgiving, like MySQL doing casts for you all over the place, or more strict like PostgreSQL where you can only compare the truthiness of something of the same type.

Assertion MySQL PostgreSQL
('A' = TRUE) IS TRUE 0 invalid input syntax for type boolean: “A”
('A' IS TRUE) IS TRUE 0 invalid input syntax for type boolean: “A”
(1 = TRUE) IS TRUE 1 operator does not exist: integer = boolean
(1 IS TRUE) IS TRUE 1 argument of IS TRUE must be type boolean, not type integer
('1' IS TRUE) IS TRUE 1 1
('0' IS FALSE) IS TRUE 1 1
(1 = '1') IS TRUE 1 1
(0 = '0') IS TRUE 1 1

Beyond the implications of wrong type comparison, as you can see on lines 1 through 4 where MySQL evaluates values as true or false, you should also worry about what the fact of a column being NULL may imply when comparing two columns.

Suddenly knowing if it is true or false doesn’t matter because you cornered yourself with a third possible value: NULL. By definition NULL is not a value neither a state, it should be considered “garbage”, and no column where you know its value and type would be garbage.

In this front both databases operate the same way:

Assertion SQL
(NULL = NULL) IS TRUE 0
(NULL = NULL) IS FALSE 0
(NULL IS NULL) IS TRUE 1
(0 = NULL) IS NULL 1
(1 = NULL) IS NULL 1
('A' = NULL) IS NULL 1
(TRUE = NULL) IS NULL 1
(FALSE = NULL) IS NULL 1
(0 IS NULL) IS TRUE 0
(0 IS NOT NULL) IS TRUE 0
(0 = NULL) IS TRUE 0
(0 = NULL) IS FALSE 0
(0 = NULL) IS NOT TRUE 1
(0 = NULL) IS NOT FALSE 1
('NULL' = NULL) IS TRUE 0
('NULL' IS NULL) IS TRUE 0

The highlighted parts are usually assumptions that people expect to behave differently, for example that zero is not equal to NULL. That catches many people by surprise. It shouldn’t catch you by surprise though, because 0 is a value. The default behavior of MySQL of casting NULL to 0, on a INTEGER NOT NULL column without a DEFAULT value taught a whole generation of developers that this assumption is true. The same applies when casting a string-based column that has no default value on a NOT NULL column to empty string.

Is that a spaceship operator?

While writing this post and taking a peek into MySQL documentation, I never noticed that a NULL-safe equal operator, <=>, existed, unfortunately the MySQL website only shows the documentation from 5.5 to 8.0, so I can’t be certain if this existed in prior releases.

SELECT
1 IS NOT NULL,
1 = NULL,
1 <=> NULL;
Assertion MySQL
1 IS NOT NULL 1
1 = NULL NULL
1 <=> NULL 0

The case where MySQL thinks it is PHP

Another behavior I discovered when doing some JOINs were the string and integer comparison. Both databases get SELECT 1 = '1'; as true, however, MySQL takes a step further:

SELECT 1 = '1a',
'1' * 3,
'1abc' + 4;
Assertion MySQL PostgreSQL
1 = '1a' 1 invalid input syntax for integer: “1a”
'1' * 3 3 3
'1abc' + 4 5 invalid input syntax for integer: “1abc”

Thus behaving like PHP. PHP is expected to change this behavior soon with this RFC. PostgreSQL, in this case, is forgiving in casting a string as an integer, only if there is an integer inside of the quotes. If you mix the integer with other characters, it throws an error as you can see above.

I Know!

No, I don’t. I thought I knew enough SQL. Apparently, I was wrong.

Independent of the language you are using, you should be aware that assertions that are true in your language may not be true in SQL and the other way as well.

Using Active Record migrations beyond SQLite

Using Active Record migrations beyond SQLite

SQLite is really a good tool to set up quick proof of concepts and small applications; however it’s not the most robust solution on the market for working with relational databases. In the open source community two databases take the top of the list: PostgreSQL and MySQL.

I did a small project for my studies. I was using SQLite as I didn’t need much out of it. Curious, I decided to see how the application would behave on other databases and decided to try PostgreSQL and MySQL. I had two problems to solve, and this post is about the first one: how to deal with the migrations. They were as follows:

class CreateArtists < ActiveRecord::Migration
def change
create_table :artists do |t|
t.string :name
t.timestamps
end
add_index :artists, :name
end
end
class CreateSongs < ActiveRecord::Migration
def change
create_table :songs do |t|
t.string :title
t.integer :artist_id
t.timestamps
end
add_index :songs, :title
add_foreign_key :songs, :artists
end
end

Active Record automatically put the field id in all of its tables, that’s why it is omitted on the migrations.

In PostgreSQL it went smoothly, all the migrations ran without any hiccup, except on MySQL, it gave me an error!

StandardError: An error has occurred, all later migrations canceled:

Column `artist_id` on table `songs` has a type of `int(11)`.
This does not match column `id` on `artists`, which has type `bigint(20)`.
To resolve this issue, change the type of the `artist_id` column on `songs` to be :integer. (For example `t.integer artist_id`).

Original message: Mysql2::Error: Cannot add foreign key constraint: ALTER TABLE `songs` ADD CONSTRAINT `fk_rails_5ce8fd4cc7`
FOREIGN KEY (`artist_id`)
REFERENCES `artists` (`id`)

The problem, beyond generating an ineligible name for an index: fk_rails_5ce8fd4cc7, is that artist_id on my table was as INT. The first thing I checked was to see if the artist.id was UNSIGNED and if my foreign key was also unsigned. They weren’t, but since were both signed, it wouldn’t throw an error. Looking more closely to the error message I noticed that the type in my foreign key column did not match the type on the primary key on the other table. Little did I know that Active Record generates the id field not as an INT, but as BIGINT.

I decided to go back and look at PostgreSQL, and to my surprise, and up to now I still am not sure of why, PostgreSQL did allow the column type mismatch where MySQL threw an error.

To fix it, I had to change the migration as follows:

class CreateSongs < ActiveRecord::Migration
def change
create_table :songs do |t|
t.string :title
t.integer :artist_id, limit: 8
t.timestamps
end
add_index :songs, :title
add_foreign_key :songs, :artists
end
end

Digging online, I found out how to create a bigint field with AR. According to the post, this would only work on MySQL, which they did, but I found it also worked with PostgreSQL (I tested MySQL 5.7 and Postgres 9.6): t.integer :artist_id, limit: 8.

The limit is used to set a maximum length for string types or number of bytes for numbers.

Why type matching is important

As an INT let’s say you can fit your number inside an espresso cup. Sure you can use the Starbucks Venti size cup to fit your coffee, but the full content of a Venti would never fit an espresso cup.

In the specific domain I am working on if I had a big list of Artists, and happen to have an artist which ID was higher than 2,147,483,647 (signed, and for both PostgreSQL and MySQL), I would get an error when trying to insert it into the Songs table since an Artist id can be up to 8 bytes (9,223,372,036,854,775,807).

Example:

Queen has its Artist id as: 21474836481 (which is a BIGINT)

Trying to insert “We Will Rock you” in the artist_id column for songs:

INSERT INTO songs (title, artist_id, created_at, updated_at)
VALUES ('We will Rock you', 21474836481, now(), now());

We get:

********** Error **********

ERROR: integer out of range
SQL state: 22003

This is the kind of problem we don’t usually notice in the beginning, and more often than not while the application is in production for even years, but this can happen and will happen if we don’t pay attention to foreign key types.

After that change, all the migrations ran smoothly. And I could actually move forward to the next problem (and post): Filtering a song title or artist name.

NOT NULL all the things!

Different types of languages deal with this “value” in diverse ways. You can have a more comprehensive list of what NULL can mean on this website. What I like to think about NULL is along the lines of invalid, as if some sort of garbage is stored there. It doesn’t mean it’s empty, it’s just mean that something is there, and it has no value to you.

Databases deal when storing this type in a similar way, PostgreSQL treats it as “unknown” while MySQL treats it as “no data“.

Both databases recommend using \N to represent NULL values where import or exporting of data is necessary.

When to use it

You don’t. Particularly, I DON’T recommend using NULL.

NULL doesn’t mean empty

So if you want to represent lack of data or optional fields use a default value. It’s bad sign of architecture having NULLABLE fields, there is an extra case to test and to write for. It adds unnecessary complexity.

However, there is one case where I do think NULL is acceptable. And that is when working with MySQL date related fields. I will talk more about this further down.

How to Query it

MySQL doesn’t recognize field = NULL because, remember, NULL means invalid, not empty. Thus using it will not return any rows.

As much as NULL value will never be equal to another NULL, when using ORDER BY, GROUP BY and DISTINCT, the server interprets the values as equal. Aggregators functions such as MIN(), SUM() and COUNT() ignore NULL values, except for COUNT(*) that counts rows, and not columns.

When using ORDER BY a column is nullable the NULL values appear first if instructed as ASC and in the end if DESC is requested.

PostgreSQL on the other hand has an option to convert equal comparisons expressions to field IS NULL, if enabled (transform_num_equals).

The ordering for ORDER BY depends on indexing of the field, by default NULL comes first, but you can specify when creating an index where the NULL values should be: top or bottom.

For aggregators functions, PostgreSQL works the same way.

Performance

Having NOT NULL columns permits similar performance on MySQL as an column = 1 do. However that doesn’t happen in LEFT JOIN operations while a field could be NULL. But this is for the type of queries where IS NULL is used:

# Assuming that `active` column is NOT NULL
SELECT * FROM users WHERE active = 1 OR active IS NULL;

Summing up directly from MySQL documentation:

Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.

The COALESCE() function

This function return the first non-null result of a column. Keep in mind to perform this operation on non-indexed columns. It is slow by its nature, just a friendly warning of when you are using to be mindful of its fallback.

The exception to the rule

I think this applies to MySQL databases. DATE/DATETIME should not be allowed to be NULL if, and only if the sql_mode directive NO_ZERO_DATE is disabled.

What does it mean? NO_ZERO_DATE doesn’t allow for 0000-00-00 to be inserted in a DATE/DATETIME. MySQL 5.7 sql_mode insures some restrictions into the database by default. If for instance you have DATE column that is NOT NULL and doesn’t pass a value to it, 0000-00-00 will be saved, because the column is NOT NULL, BUT will give a warning:

mysql> DESCRIBE users;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(45) | NO | | NULL | |
| created_at | datetime | NO | | NULL | |
+------------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> INSERT INTO users (name) VALUES ("Gabi");
Query OK, 1 row affected, 1 warning (0.01 sec)

mysql> SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------+
| Warning | 1364 | Field "created_at" doesn't have a default value |
+---------+------+-------------------------------------------------+
1 row in set (0.00 sec)

Making this operation to make sure no zero date will be allowed as it is by default in MySQL 5.7:

SET @@GLOBAL.sql_mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE;

When trying to insert a similar query:

mysql> INSERT INTO users (name) VALUES ("Blossom");
ERROR 1364 (HY000): Field "created_at" doesn't have a default value

To sum it up

If you have MySQL and NO_ZERO_DATE is in your sql_mode, you should ALWAYS use NOT NULL. MySQL 5.7 brings the mode enabled by default among with other things, read more here.

If you don’t have it enabled for any other reason, then DATE/DATETIME MAY be NULL, because data integrity > performance in this case.

Again, 0000-00-00 IS NOT a valid date.