Tag: database

Gabi’s Office Hours a.k.a. the Hallway Track

Office Hours featuring gabidavila
Book your time on gabi.tips/slots

Since travel to deliver content and awareness is being held for a while (COVID-19), I am missing the interaction I would have with people at random times during a conference, the exchange of ideas and impromptu problem solving.

The hallway track of a conference has always been my favorite part: the networking, exchanging ideas, getting feedback about products, all of that is harder to replicate without talks and social the setting of a conference.

I decided to experiment and try to find new ways to interact with “my” audience, the idea of Office Hours came, which originally is not mine, however I like the fact that I can have a technical conversation with someone that has questions about databases (not just about Cloud SQL) and also provide a window of opportunity for you to give me feedback on our products.

Examples of things we can talk about:

  • Query Performance
  • Best Practices for Migrations
  • Where to store your data
  • Should I put my Server on Kubernetes?
  • How do I migrate my data to the cloud?

The pilot started with 30 min sessions with the availability to talk to up to 10 people weekly (5h/week). However I think it is more productive a change to up to 3h/week and talking to 9 people in 20 min slots in more timezones.

Click here to book your time: gabi.tips/slots.

Tips for a good appointment:

  • Write down your questions and add it to the booking tool.
  • If a lengthy context is needed to understand your problem, please add the information in the booking tool, however do not send to me:
    • PII – Personal Identifiable Information
    • Your SQL Dump
    • Your intellectual property
    • No database credentials
  • Have defined scope of what you want to talk about, I can’t solve everything in 20 minutes.
  • Do not double book, other people should also take advantage of this

Disclaimers

This is not a guaranteed consultancy agreement, this is just people talking informally about tech problems and possible solutions, information shared and explained are guides, you are responsible for weighing your options and if any advice is executed, the outcome is your responsibility.

Strip your TEXT Field

TEXT fields are a nightmare. For you and for your server. It is slow to retrieve, and if you are doing searches on it, be prepared, things are going to get bumpy.

rigby_pc

If you use MySQL with a MyISAM engine, this may not be an issue for you, you can create a FULLTEXT index, your only problem is if you want to add a new column, an alter table can take forever, since MySQL creates a new table and copies the old data to the new table. For those who uses MySQL with an InnoDB engine, prepare because, you’ll have more issues. Indexes can’t be FULLTEXT and if you do need an index you must inform the length of it. It defeats the purpose of you doing the search in that field.

The observation above is only true for MySQL 5.5 or below, since version 5.6 MySQL does support FULLTEXT indexes on InnoDB – thanks Davey Shafik for the correction.

My TEXT field is not searchable, I just use it to store a big string

In that case you will only have trouble when it comes the time to add a new column or index to that table. As I said, when executing an ALTER TABLE statement, MySQL will create a new table with the new modifications and reinsert the data. Once I did this in a huge table with a TEXT field, it took 2 days. So, be careful.

Maintenance can be an issue too and I had problems in the past with MyISAM corrupting table and losing a lot of data. In the new versions of MySQL things seems to have changed, I think since the 5.1 version I didn’t face that again.

Why the statement LIKE is so slow?

Take this table as example:

CREATE TABLE `options` (
`id` INT,
`user_id` INT,
`value` TEXT,
`created` DATETIME
) DEFAULT CHARSET=utf8;

Well, one of the first things we learn when programming is that reading from the hard disk is costly while reading from the memory is super fast. For every row stored in a table like the one above, the field value will not be stored inline like the id value. It will store a reference to a file containing the actual data, MySQL stores 4MB of data into a TEXT field.

Having that in mind, every time you use a LIKE statement you are reading from your server disk, which is pretty slow (you can have a super server with high IOPS, but this is not the case for the majority). Unless you use the MyISAM engine and create a FULLTEXT index into the value field:

ALTER TABLE `options`
ADD FULLTEXT INDEX `ix_value` (`value` ASC);

Alternatives

Making the search fast

The best solution it is to use a search service like Elastic (previously known as ElasticSearch) or CloudSearch. This of course will add another layer to your application, but it is a good tradeoff.

You will need to index the data through one of the services in an extra step of the table insert. One thing for sure, the search will be FAST!

ohh_adventure_time

Making the retrieval fast (no search)

You can do what the database does already: store a reference in the value field for a text file. Ideally this text file would be stored in a CDN like Akamai or Amazon S3.

Storing in the database

If you don’t want to add another layer, you can achieve a compromise by removing the TEXT field from the table and adding it to a second one.

It would look like this:

CREATE TABLE `options` (
`id` INT,
`user_id` INT,
`created` DATETIME
)Engine=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `options_text` (
`option_id` INT,
`value` TEXT
) Engine=MyISAM DEFAULT CHARSET=utf8;

That way you can create a FULLTEXT index in the table and still be able to work with better data consistency provided by InnoDB. You can use InnoDB engine on the second table too, but it is not needed unless you want to use the constraint in the foreign keys.

To retrieve it, a INNER JOIN or LEFT JOIN between the two tables:

SELECT
`options`.`id` AS id,
`options`.`user_id` AS user_id,
`options`.`created` AS created,
`options_text`.`value` AS value
FROM
`options`
INNER JOIN
`options_text` ON `options`.`id` = `options_text`.`option_id`

Don’t be fooled, the search into options_text will still be slow, but it will be better than if it all were stored in the same table.

Conclusion

There is not a right way of doing this, you must consider all the implications of either approach and see what works better for you.

Keep this in mind: the first two proposed ideas need to add an extra step in the CRUD operations, and only you know how problematic can be to mess with that in a legacy application, for example. You won’t have much trouble to implement with a clean code.

I recommend you read more about BLOB and TEXT fields performance and storage in this article from Peter Zaitsev in the Percona blog.

nice_jake_and_fin