A lot of people ask me to teach them how to do query analysis and performance. The truth is: there isn’t a script to follow. The following paragraphs are a brain dump on what usually goes on my mind when I am debugging and analyzing.
Please comment on what you think I should focus on to cover here.
TL; DR;
- It’s just a messy post with database-y stuff
- This post doesn’t have a conclusion, it is just me laying down my thoughts on performance and optimizations.
Thoughts
Query performance is a really difficult subject to talk about. Mostly because because SQL is a declarative language, leaving it up to the Optimizer to decide which way is the best to retrieve the information needed and that is based in so many variables.
The most common problem regarding optimization I see, comes not from the Database itself, but how we handle the requests on the application layer, the following for instance would cause N+1 problems:
Code example:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
users = User.all | |
users.each do |user| | |
puts "Name #{user.name}" | |
puts "Addresses: " | |
user.addresses.each do |address| | |
puts address.street | |
puts "#{address.city} – #{address.state}" | |
end | |
end |
Although seemingly innocent at first, this code could easily slow down performance on the database due to the amount of requests that would be made.
You also need to know about the intricacies of indexes, which one is the best, if you have a composite index, which should go first, and what happens if I only use one of the fields of a two column indexes in my search? Does it still uses the index somehow? Another rule of thumb is that if an index is a BTREE
, on a single column, you can use it either ASC
or DESC
.
Or better yet: why my transactions are taking so long to complete? Does it have too many indexes on the table? Is any other query locking table X?
Even a single INNER JOIN
could be highly costly if joining two large tables.
Why are you saving that JSON in a TEXT
field? Since we are on the subject, you really need the JSON in the relational database and not in a document store?
You don’t need to port all your data from PostgreSQL/MySQL to MongoDB if you want to have MongoDB on your stack. Everything has its place, relational data on relational databases and non-relational data on non relational databases. I even find unfair benchmarks between a SQL database and a NoSQL one. They were made to solve different problems, you can’t possibly have the same use case for both of them.
No, it’s not ok to have category_1
, category_2
, ...
, category_n
as columns on your products
table.
Avoid as much as possible nullable fields.
Relationships should also explicitly live on the RDBMS, not only on your model, if you have a user_id
on your addresses
table, tell the database so, naming it user_id
doesn’t automatically create the foreign key.
You need:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ALTER TABLE addresses ADD CONSTRAINT `fk_user_id` FOREIGN KEY (`user_id`) REFERENCES users (`id`); |
Or your migration should look something like this:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class CreateUsers < ActiveRecord::Migration[5.1] | |
def change | |
create_table :users do |t| | |
t.string :name | |
t.timestamps | |
end | |
end | |
end | |
class CreateAddresses < ActiveRecord::Migration[5.1] | |
def change | |
create_table :addresses do |t| | |
t.text :street | |
t.string :city | |
t.string :state | |
t.string :zipcode | |
t.integer :user_id | |
t.timestamps | |
end | |
add_foreign_key :addresses, :users | |
end | |
end |
Line 24: adds to the table addresses
a foreign key from users
.
End
And you, what you think is missing in this blogpost? What do you want to get deeper on?
“Relationships should also explicitly live on the RDBMS, not only on your model, […]”
Terrifies me that in 2018 we still need to say this (And yes, I’ve seen it).
Fantastic article. Thank you.
LikeLiked by 1 person
Innodb is a really good kv store, so sometimes it makes a lot of sense to use MySQL as a kv instead of the badly designed storage engines in most nosql stores.
LikeLike
I’ve been at several orgs where they advocated removing all foreign key constraints in the db for performance reasons. In some cases they did dev with the constraints there to ensure that the logic was sound, but in others they just kinda hoped. I’m curious as to your implication that adding foreign keys is a boon to performance instead of a penalty? Or were you just saying that specific one was an anti-pattern vs. a performance win?
LikeLike
You can have the fastest database in the world, however if you can’t ensure data consistency that means nothing. The cost spent on sanitizing, normalizing the data afterwards to use in other scenarios, like business intelligence, ends up as high as fixing the damn issue of the foreign key.
I can see why foreign keys can slow down performance on writes, but there is also the gain of insuring the data is right and the query optimizer will assume a lot of things given the existence of a foreign key. (Interesting article: https://www.scarydba.com/2010/11/22/do-foreign-key-constraints-help-performance/)
Organizations that advocate for extinguishing this usually, that I’ve seen, have a bad design regarding values in columns.
NULL
is not a value, hence the problem when you have a nullable field that is a FK. There are ways to circumvent that, but banishing foreign keys is not one of them. You should not useNULL
as value on your table. Use a flag instead if you wish to indicate the presence or absence of something. Don’t build assumptions on top ofNULL
.LikeLike
Also complimenting my comment from before. If you want to save the performance case of having the database do not lookup for the value before
INSERT
/UPDATE
don’t forget that whatever cost you saved by not having the foreign key, it is an addedSELECT
to verify the integrity the database would be doing for you.LikeLike