Tag: activerecord

Using Active Record migrations beyond SQLite

Using Active Record migrations beyond SQLite

SQLite is really a good tool to set up quick proof of concepts and small applications; however it’s not the most robust solution on the market for working with relational databases. In the open source community two databases take the top of the list: PostgreSQL and MySQL.

I did a small project for my studies. I was using SQLite as I didn’t need much out of it. Curious, I decided to see how the application would behave on other databases and decided to try PostgreSQL and MySQL. I had two problems to solve, and this post is about the first one: how to deal with the migrations. They were as follows:

class CreateArtists < ActiveRecord::Migration
def change
create_table :artists do |t|
t.string :name
t.timestamps
end
add_index :artists, :name
end
end
class CreateSongs < ActiveRecord::Migration
def change
create_table :songs do |t|
t.string :title
t.integer :artist_id
t.timestamps
end
add_index :songs, :title
add_foreign_key :songs, :artists
end
end

Active Record automatically put the field id in all of its tables, that’s why it is omitted on the migrations.

In PostgreSQL it went smoothly, all the migrations ran without any hiccup, except on MySQL, it gave me an error!

[code lang=text]
StandardError: An error has occurred, all later migrations canceled:

Column `artist_id` on table `songs` has a type of `int(11)`.
This does not match column `id` on `artists`, which has type `bigint(20)`.
To resolve this issue, change the type of the `artist_id` column on `songs` to be :integer. (For example `t.integer artist_id`).

Original message: Mysql2::Error: Cannot add foreign key constraint: ALTER TABLE `songs` ADD CONSTRAINT `fk_rails_5ce8fd4cc7`
FOREIGN KEY (`artist_id`)
REFERENCES `artists` (`id`)
[/code]

The problem, beyond generating an ineligible name for an index: fk_rails_5ce8fd4cc7, is that artist_id on my table was as INT. The first thing I checked was to see if the artist.id was UNSIGNED and if my foreign key was also unsigned. They weren’t, but since were both signed, it wouldn’t throw an error. Looking more closely to the error message I noticed that the type in my foreign key column did not match the type on the primary key on the other table. Little did I know that Active Record generates the id field not as an INT, but as BIGINT.

I decided to go back and look at PostgreSQL, and to my surprise, and up to now I still am not sure of why, PostgreSQL did allow the column type mismatch where MySQL threw an error.

To fix it, I had to change the migration as follows:

class CreateSongs < ActiveRecord::Migration
def change
create_table :songs do |t|
t.string :title
t.integer :artist_id, limit: 8
t.timestamps
end
add_index :songs, :title
add_foreign_key :songs, :artists
end
end

Digging online, I found out how to create a bigint field with AR. According to the post, this would only work on MySQL, which they did, but I found it also worked with PostgreSQL (I tested MySQL 5.7 and Postgres 9.6): t.integer :artist_id, limit: 8.

The limit is used to set a maximum length for string types or number of bytes for numbers.

Why type matching is important

As an INT let’s say you can fit your number inside an espresso cup. Sure you can use the Starbucks Venti size cup to fit your coffee, but the full content of a Venti would never fit an espresso cup.

In the specific domain I am working on if I had a big list of Artists, and happen to have an artist which ID was higher than 2,147,483,647 (signed, and for both PostgreSQL and MySQL), I would get an error when trying to insert it into the Songs table since an Artist id can be up to 8 bytes (9,223,372,036,854,775,807).

Example:

Queen has its Artist id as: 21474836481 (which is a BIGINT)

Trying to insert “We Will Rock you” in the artist_id column for songs:

INSERT INTO songs (title, artist_id, created_at, updated_at)
VALUES ('We will Rock you', 21474836481, now(), now());

We get:

[code lang=text]
********** Error **********

ERROR: integer out of range
SQL state: 22003
[/code]

This is the kind of problem we don’t usually notice in the beginning, and more often than not while the application is in production for even years, but this can happen and will happen if we don’t pay attention to foreign key types.

After that change, all the migrations ran smoothly. And I could actually move forward to the next problem (and post): Filtering a song title or artist name.

ActiveRecord: Has Many Through Through Relationship

Developers in general love when stuff works. Having a solution that can solve about 80% of your problems can leave time for you to deal with the other 20%.

But this post is not about Active Record vs. Data Mapper or any thing like it. Each one has its use case where it’s best applicable and it depends on you (or your team) to decide which to use. Keep in mind that with Active Record (AR), domain concerns and persistence concerns are mixed together and that with Data Mapper (DM), domain concerns and persistence concerns are kept separate.

Let’s talk about magic. How magical AR can be and how it can make your life easier. The beauty of programming is that two different individuals can reach the same result using different routes even if using the same tools. The convention over configuration that some frameworks like Laravel and Rails use makes everything feel so effortless, while actually under the hood, there is a lot going on.

Solving a code challenge

This week I was given the following schema on this code challenge:

Database Mapping

With four models:

  • Boat:
    • belongs to a Captain
    • has many records of BoatClassification
    • has many records of  Classification throughBoatClassification
  • Captain
    • has many records of Boat
  • BoatClassification
    • belongs to a Boat
    • belongs to a Classification
  • Classification
    • has many records of BoatClassification
    • has many records of Boat through  BoatClassification

And here is the code in Ruby:

### app/models/boat.rb ###
class Boat < ActiveRecord::Base
belongs_to :captain
has_many :boat_classifications
has_many :classifications, through: :boat_classifications
end
### app/models/captain.rb ###
class Captain < ActiveRecord::Base
has_many :boats
end
### app/models/boat_classification.rb ###
class BoatClassification < ActiveRecord::Base
belongs_to :boat
belongs_to :classification
end
### app/models/classification.rb ###
class Classification < ActiveRecord::Base
has_many :boat_classifications
has_many :boats, through: :boat_classifications
end
view raw models.rb hosted with ❤ by GitHub

The models were given to me as shown above, including the relationships. Stuff started easy, like:

Class: Boat -> Retrieve all boats without a Captain:

Boat.where(captain_id: nil) which translates to:

SELECT `boats`.*
FROM `boats`
WHERE `boats`.`captain_id` IS NULL
view raw ar_snippet_sql1.sql hosted with ❤ by GitHub

But then, stuff started to get a bit more complicated…

Class: Boat -> Retrieve all boats with three Classifications:

My thought: This one I got it! The code already showed me the has many through from Boat to Classification, now what I need to do is GROUP BY boats.id and all will be fine…

Boat.joins(:classifications).group("boats.id").having("count(classifications.id) = ?", 3)
view raw ar_snippet_ruby_1.rb hosted with ❤ by GitHub

Active Record saved me from a lot of trouble from doing the following query:

SELECT `boats`.*
FROM `boats`
INNER JOIN `boat_classifications`
ON `boat_classifications`.`boat_id` = `boats`.`id`
INNER JOIN `classifications`
ON `classifications`.`id` = `boat_classifications`.`classification_id`
GROUP BY `boats`.`id`
HAVING count(classifications.id) = 3
view raw ar_snippet_sql_2.sql hosted with ❤ by GitHub

This are 8 lines of code translated into one!

magic trick

Ok, I thought, we are going places with this. Give me one more!

Class: Captain -> Retrieve all Captains that pilot a specific Classification of a Boat

I knew the SQL code for this one! It’s easy when coding to end up doing chained queries with subqueries inside. I wanted to avoid that as much as possible knowing that would be able to solve the question with joins.

Reading the documentation I saw that Ruby’s Active Record gives us tools to avoid this kind of situation, one of them are the relationships. It’s possible to say looking at the classes and diagrams that Classification and Captain have a nested relationship. A has many through through if you like, yes that’s “through” twice.

Diagram

One way to remember is to look at the model Captain and Boat:

class Captain < ActiveRecord::Base
has_many :boats
end
class Boat < ActiveRecord::Base
belongs_to :captain
has_many :boat_classifications
has_many :classifications, through: :boat_classifications
end
view raw ar_snippet_ruby_2.rb hosted with ❤ by GitHub

:boats is a relationship for Boat. This means I can do a join, (specifically a nested one):

Captain -> Boat -> BoatClassification -> Classification

Captain.joins(boats: {boat_classifications: :classification})
view raw ar_snippet_ruby_3.rb hosted with ❤ by GitHub

confused, oh wait!

Ok, that makes sense, through :boats I have access to :boat_classifications which in turn has access to the :classification relationship. But, :boats also has access to :classifications, making this possible:

Captain -> Boat ->> Classification

Captain.joins(boats: :classifications)
view raw ar_snippet_ruby_4.rb hosted with ❤ by GitHub

And we finally add the filter to the query:

Captain.joins(boats: :classifications).where(classifications: {name: 'Sailboat'})
view raw ar_snippet_ruby_5.rb hosted with ❤ by GitHub

Saving us from having to write this:

SELECT
`captains`.*
FROM
`captains`
INNER JOIN
`boats` ON `boats`.`captain_id` = `captains`.`id`
INNER JOIN
`boat_classifications` ON `boat_classifications`.`boat_id` = `boats`.`id`
INNER JOIN
`classifications` ON `classifications`.`id` = `boat_classifications`.`classification_id`
WHERE
`classifications`.`name` = 'Sailboat'
view raw ar_snippet_sql_3.sql hosted with ❤ by GitHub

TL;DR;

  • Chaining method calls on the model’s class always returns the model itself
  • Hashes are more used than you would imagine
  • Avoid subqueries
  • Magic happens through relationships (which saves you from the subqueries)
  • Putting things on diagrams is not a question of being fancy, but rather to be able to better visualize problems.

Bonus – Performance

If we had used the subquery for searching on the last category we would have:

SELECT
`captains`.*
FROM
`captains`
WHERE
`captains`.`id` IN (SELECT
`boats`.`captain_id`
FROM
`boats`
WHERE
`boats`.`id` IN (SELECT
`boat_classifications`.`boat_id`
FROM
`boat_classifications`
WHERE
`boat_classifications`.`classification_id` IN (SELECT
`classifications`.`id`
FROM
`classifications`
WHERE
`classifications`.`name` = 'Sailboat')));
view raw ar_snippet_sql_4.sql hosted with ❤ by GitHub

Doing a query cost analysis on it, by adding an EXPLAIN in the beginning of the query, it returns:

SubqueriesAR

But using the correct relationships we have:

InnerJoinsAR.png

Don’t worry much about the numbers, but look more at the colors, by using the existing foreign keys we avoid doing a full table scan on the tables, even with a join of four tables the query plan showed that subqueries are 30% more slower than using the existing indexes and relationships.