Tag: nested join

ActiveRecord: Has Many Through Through Relationship

Developers in general love when stuff works. Having a solution that can solve about 80% of your problems can leave time for you to deal with the other 20%.

But this post is not about Active Record vs. Data Mapper or any thing like it. Each one has its use case where it’s best applicable and it depends on you (or your team) to decide which to use. Keep in mind that with Active Record (AR), domain concerns and persistence concerns are mixed together and that with Data Mapper (DM), domain concerns and persistence concerns are kept separate.

Let’s talk about magic. How magical AR can be and how it can make your life easier. The beauty of programming is that two different individuals can reach the same result using different routes even if using the same tools. The convention over configuration that some frameworks like Laravel and Rails use makes everything feel so effortless, while actually under the hood, there is a lot going on.

Solving a code challenge

This week I was given the following schema on this code challenge:

Database Mapping

With four models:

  • Boat:
    • belongs to a Captain
    • has many records of BoatClassification
    • has many records of  Classification throughBoatClassification
  • Captain
    • has many records of Boat
  • BoatClassification
    • belongs to a Boat
    • belongs to a Classification
  • Classification
    • has many records of BoatClassification
    • has many records of Boat through  BoatClassification

And here is the code in Ruby:

### app/models/boat.rb ###
class Boat < ActiveRecord::Base
belongs_to :captain
has_many :boat_classifications
has_many :classifications, through: :boat_classifications
end
### app/models/captain.rb ###
class Captain < ActiveRecord::Base
has_many :boats
end
### app/models/boat_classification.rb ###
class BoatClassification < ActiveRecord::Base
belongs_to :boat
belongs_to :classification
end
### app/models/classification.rb ###
class Classification < ActiveRecord::Base
has_many :boat_classifications
has_many :boats, through: :boat_classifications
end
view raw models.rb hosted with ❤ by GitHub

The models were given to me as shown above, including the relationships. Stuff started easy, like:

Class: Boat -> Retrieve all boats without a Captain:

Boat.where(captain_id: nil) which translates to:

SELECT `boats`.*
FROM `boats`
WHERE `boats`.`captain_id` IS NULL

But then, stuff started to get a bit more complicated…

Class: Boat -> Retrieve all boats with three Classifications:

My thought: This one I got it! The code already showed me the has many through from Boat to Classification, now what I need to do is GROUP BY boats.id and all will be fine…

Boat.joins(:classifications).group("boats.id").having("count(classifications.id) = ?", 3)

Active Record saved me from a lot of trouble from doing the following query:

SELECT `boats`.*
FROM `boats`
INNER JOIN `boat_classifications`
ON `boat_classifications`.`boat_id` = `boats`.`id`
INNER JOIN `classifications`
ON `classifications`.`id` = `boat_classifications`.`classification_id`
GROUP BY `boats`.`id`
HAVING count(classifications.id) = 3

This are 8 lines of code translated into one!

magic trick

Ok, I thought, we are going places with this. Give me one more!

Class: Captain -> Retrieve all Captains that pilot a specific Classification of a Boat

I knew the SQL code for this one! It’s easy when coding to end up doing chained queries with subqueries inside. I wanted to avoid that as much as possible knowing that would be able to solve the question with joins.

Reading the documentation I saw that Ruby’s Active Record gives us tools to avoid this kind of situation, one of them are the relationships. It’s possible to say looking at the classes and diagrams that Classification and Captain have a nested relationship. A has many through through if you like, yes that’s “through” twice.

Diagram

One way to remember is to look at the model Captain and Boat:

class Captain < ActiveRecord::Base
has_many :boats
end
class Boat < ActiveRecord::Base
belongs_to :captain
has_many :boat_classifications
has_many :classifications, through: :boat_classifications
end

:boats is a relationship for Boat. This means I can do a join, (specifically a nested one):

Captain -> Boat -> BoatClassification -> Classification

Captain.joins(boats: {boat_classifications: :classification})

confused, oh wait!

Ok, that makes sense, through :boats I have access to :boat_classifications which in turn has access to the :classification relationship. But, :boats also has access to :classifications, making this possible:

Captain -> Boat ->> Classification

Captain.joins(boats: :classifications)

And we finally add the filter to the query:

Captain.joins(boats: :classifications).where(classifications: {name: 'Sailboat'})

Saving us from having to write this:

SELECT
`captains`.*
FROM
`captains`
INNER JOIN
`boats` ON `boats`.`captain_id` = `captains`.`id`
INNER JOIN
`boat_classifications` ON `boat_classifications`.`boat_id` = `boats`.`id`
INNER JOIN
`classifications` ON `classifications`.`id` = `boat_classifications`.`classification_id`
WHERE
`classifications`.`name` = 'Sailboat'

TL;DR;

  • Chaining method calls on the model’s class always returns the model itself
  • Hashes are more used than you would imagine
  • Avoid subqueries
  • Magic happens through relationships (which saves you from the subqueries)
  • Putting things on diagrams is not a question of being fancy, but rather to be able to better visualize problems.

Bonus – Performance

If we had used the subquery for searching on the last category we would have:

SELECT
`captains`.*
FROM
`captains`
WHERE
`captains`.`id` IN (SELECT
`boats`.`captain_id`
FROM
`boats`
WHERE
`boats`.`id` IN (SELECT
`boat_classifications`.`boat_id`
FROM
`boat_classifications`
WHERE
`boat_classifications`.`classification_id` IN (SELECT
`classifications`.`id`
FROM
`classifications`
WHERE
`classifications`.`name` = 'Sailboat')));

Doing a query cost analysis on it, by adding an EXPLAIN in the beginning of the query, it returns:

SubqueriesAR

But using the correct relationships we have:

InnerJoinsAR.png

Don’t worry much about the numbers, but look more at the colors, by using the existing foreign keys we avoid doing a full table scan on the tables, even with a join of four tables the query plan showed that subqueries are 30% more slower than using the existing indexes and relationships.