A replication user is necessary to set up the relationship Primary/Replica. This is a short step but it needs a bit more of attention.
From the MySQL 5.7 documentation (highlights are my own):
Although you do not have to create an account specifically for replication, you should be aware that the replication user name and password are stored in plain text in the master info repository file or table (see Section 16.2.4.2, “Slave Status Logs”). Therefore, you may want to create a separate account that has privileges only for the replication process, to minimize the possibility of compromise to other accounts.
The following command specifically will allow replication from all databases and tables connecting from all hosts. For security reasons you may want to limit access to replication only to the IP address of the server doing the replication.
Log into the MySQL console using a user with GRANT privileges in the primary server and execute the following:
CREATE USER 'replication'@'%' IDENTIFIED BY 'mysupersecretpassword'
GRANT REPLICATION SLAVE ON *.* TO 'replication'@'%';
My advice is instead of using the % wildcard, set up the IP address of your replica.
This user will be added to the primary ’s MASTER_USER option, and in theory could be any user as long it also has REPLICATION SLAVE privileges. After that, the replica will connect to the primary and perform some kind of handshake with those credentials and if they match, theprimary will allow replication to occur.
See something wrong in this tutorial? Please don’t hesitate to message me through the comments or the contact page.
Generated Columns is a feature released on MySQL 5.7. They can be used during CREATE TABLE or ALTER TABLE statements. It is a way of storing data without actually sending it through the INSERT or UPDATE clauses in SQL. The database resolves what the data will be.
There are two types of Generated Columns: Virtual and Stored. They work with:
Besides that, they can be indexed but they don’t allow subqueries in it.
A Generated Column works within the table domain. If you need subqueries on a particular column, you may have to look at Views.
The basic example
As an example I am going to use an e-commerce database as based on my past experience of what I have seen and worked. You will probably have at least these tables or something similar:
users – stores user info
products – stores product info like price and description
orders – stores the user_id, date of order
orders_items – stores product_id, order_id, quantity and price at the time of purchase
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
One example is to get the total of that order_item row, something like total_item_price that would store the value of product_price * quantity to show how much the summed amount of an item would be. Some databases have the MONEY type to store price, as with MySQL it is recommended to work with DECIMAL.
People solve this problem in different ways:
store the calculated price on a new column to make it easier to retrieve;
create a view;
or they calculate in the application itself, which in this case might cause problems due to how the language handles floats. There are libraries to deal with money values in a lot of languages and frameworks, however, the overhead of converting each row into a money object could be costly depending on the amount of data being transferred.
Another way I’ve seen is: people calculate in the query the total amount for the orders_items row as product_price * quantity:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
They take no disk space, except when using a Virtual Column as in a Secondary Index.
They are an INPLACE operation: it means the table definition is changed without having to recopy all the data again. More info.
The values are calculated on the fly during read operations and BEFORE triggers.
Consider using virtual columns for data where changes happens in a significant number of times. The cost of a Virtual Column comes from reading a table constantly and the server has to compute every time what that column value will be.
Stored Columns
They do use disk space.
It has the same cost of adding a new column, so it is a COPY operation
Values are updated in every INSERT and UPDATE statement.
You should consider using Stored Columns for when the data doesn’t change significantly or at all after creation, like for instance, the example above with the orders_items table. Once a purchase is made, the price of the product is stored, not being changed, neither the quantity. Considering this information we could create total_item_price as a Stored Column.
The code
Creating a table
-- Virtual Column
CREATE TABLE `orders_items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`product_price` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`quantity` int(10) unsigned NOT NULL DEFAULT 1,
`total_item_price` decimal(10,2) AS (`quantity` * `product_price`),
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` varchar(45) NOT NULL DEFAULT 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;
-- Stored Column
CREATE TABLE `orders_items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`product_price` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`quantity` int(10) unsigned NOT NULL DEFAULT 1,
`total_item_price` decimal(10,2) AS (`quantity` * `product_price`) STORED,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` varchar(45) NOT NULL DEFAULT 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;
Notice how the definition changes on line 9 and 23: you have another keyword, AS, then an expression and specifically on line 23 you see a STORED keyword. In both lines they are generated columns, if nothing is specified will be a VIRTUAL column.
Altering a table
It uses the same syntax as adding a column, just adding the “AS (expression)” after the data type:
-- `full_name` as VIRTUAL COLUMN
ALTER TABLE users
ADD COLUMN `full_name` VARCHAR(500)
AS (CONCAT_WS(" ", `first_name`, `last_name`));
-- `total_item_price` as STORED COLUMN
ALTER TABLE orders_items
ADD COLUMN `total_item_price` DECIMAL(10, 2)
AS (`quantity` * `product_price`) STORED;
JSON fields
It is also possible to extract data from JSON fields using generated columns. As the functions for JSON are built-in, JSON_EXTRACT and JSON_UNQUOTE as well “->” and “->>” work as expressions for a generated column:
-- Stored Columns
ALTER TABLE `twitter_users`
ADD COLUMN `location` VARCHAR(255)
AS (response->>"$.location") STORED;
Final considerations
When the type is STORED, it must be specified after the expression otherwise the default behaviour will be to be VIRTUAL.
Generated columns can have indexes created as the following, no matter if stored, virtual or extracted from a JSON field:
ALTER TABLE users
ADD INDEX `ix_full_name` (`full_name`);
Disclaimer: this post takes into consideration that strict mode is enabled on the server
VARCHAR and CHAR are used to store strings. VARCHAR stores varying length and CHAR always use the same exact size no matter the size of the string. For example, CHAR(4) will always store 4 bytes, whereas VARCHAR(4) will store up to 5 bytes. See documentation.
When we create a table like this one:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We put inside the parentheses the length of the field in characters for the VARCHAR field. However, the maximum size in bytes of the field will depend on the CHARSET and COLLATION of the table. You can also specify a different collation for a column.
The number of length bytes required by a VARCHAR column must remain the same. For VARCHAR values of 0 to 255, one length byte is required to encode the value. For VARCHAR values of 256 bytes or more, two length bytes are required. As a result, in-place ALTER TABLE only supports increasing VARCHAR size from 0 to 255 bytes or increasing VARCHAR size from a value equal to or greater than 256 bytes. In-place ALTER TABLE does not support increasing VARCHAR size from less than 256 bytes to a value equal to or greater than 256 bytes. In this case, the number of required length bytes would change from 1 to 2, which is only supported by a table copy (ALGORITHM=COPY) (…)
The section highlighted is true, however, a bit misleading, changes between VARCHAR(1) and VARCHAR(255) will only be INPLACE if you are using latin1 charset. If you are using utf8 for instance that range drops from VARCHAR(1) to VARCHAR(63). The reason behind this is because in worst case scenario that field with utf8 will count each character as 4 bytes, making VARCHAR(63) < 256 bytes and VARCHAR(>63) >= 256 bytes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Online DDL changes are supported, but you must pay attention to your field size in bytes. Which it doesn’t mean it is the size inside the parentheses or the character count.
I found this while meddling with some change in size fields and I didn’t think the documentation was clear enough in the highlighted example.
Different types of languages deal with this “value” in diverse ways. You can have a more comprehensive list of what NULL can mean on this website. What I like to think about NULL is along the lines of invalid, as if some sort of garbage is stored there. It doesn’t mean it’s empty, it’s just mean that something is there, and it has no value to you.
Databases deal when storing this type in a similar way, PostgreSQL treats it as “unknown” while MySQL treats it as “no data“.
Both databases recommend using \N to represent NULL values where import or exporting of data is necessary.
When to use it
You don’t. Particularly, I DON’T recommend using NULL.
NULL doesn’t mean empty
So if you want to represent lack of data or optional fields use a default value. It’s bad sign of architecture having NULLABLE fields, there is an extra case to test and to write for. It adds unnecessary complexity.
However, there is one case where I do think NULL is acceptable. And that is when working with MySQL date related fields. I will talk more about this further down.
How to Query it
MySQL doesn’t recognize field = NULL because, remember, NULL means invalid, not empty. Thus using it will not return any rows.
As much as NULL value will never be equal to another NULL, when using ORDER BY, GROUP BY and DISTINCT, the server interprets the values as equal. Aggregators functions such as MIN(), SUM() and COUNT() ignore NULL values, except for COUNT(*) that counts rows, and not columns.
When using ORDER BY a column is nullable the NULL values appear first if instructed as ASC and in the end if DESC is requested.
PostgreSQL on the other hand has an option to convert equal comparisons expressions to field IS NULL, if enabled (transform_num_equals).
The ordering for ORDER BY depends on indexing of the field, by default NULL comes first, but you can specify when creating an index where the NULL values should be: top or bottom.
For aggregators functions, PostgreSQL works the same way.
Performance
Having NOT NULL columns permits similar performance on MySQL as an column = 1 do. However that doesn’t happen in LEFT JOIN operations while a field could be NULL. But this is for the type of queries where IS NULL is used:
# Assuming that `active` column is NOT NULL
SELECT * FROM users WHERE active = 1 OR active IS NULL;
Summing up directly from MySQL documentation:
Declare columns to be NOT NULL if possible. It makes SQL operations faster, by enabling better use of indexes and eliminating overhead for testing whether each value is NULL. You also save some storage space, one bit per column. If you really need NULL values in your tables, use them. Just avoid the default setting that allows NULL values in every column.
The COALESCE() function
This function return the first non-null result of a column. Keep in mind to perform this operation on non-indexed columns. It is slow by its nature, just a friendly warning of when you are using to be mindful of its fallback.
The exception to the rule
I think this applies to MySQL databases. DATE/DATETIMEshould not be allowed to be NULL if, and only if the sql_mode directive NO_ZERO_DATE is disabled.
What does it mean? NO_ZERO_DATE doesn’t allow for 0000-00-00 to be inserted in a DATE/DATETIME. MySQL 5.7 sql_mode insures some restrictions into the database by default. If for instance you have DATE column that is NOT NULL and doesn’t pass a value to it, 0000-00-00 will be saved, because the column is NOT NULL, BUT will give a warning:
mysql> DESCRIBE users;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(45) | NO | | NULL | |
| created_at | datetime | NO | | NULL | |
+------------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
mysql> INSERT INTO users (name) VALUES ("Gabi");
Query OK, 1 row affected, 1 warning (0.01 sec)
mysql> SHOW WARNINGS;
+---------+------+-------------------------------------------------+
| Level | Code | Message |
+---------+------+-------------------------------------------------+
| Warning | 1364 | Field "created_at" doesn't have a default value |
+---------+------+-------------------------------------------------+
1 row in set (0.00 sec)
Making this operation to make sure no zero date will be allowed as it is by default in MySQL 5.7:
SET @@GLOBAL.sql_mode = "STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE;
When trying to insert a similar query:
mysql> INSERT INTO users (name) VALUES ("Blossom");
ERROR 1364 (HY000): Field "created_at" doesn't have a default value
To sum it up
If you have MySQL andNO_ZERO_DATE is in your sql_mode, you should ALWAYS use NOT NULL. MySQL 5.7 brings the mode enabled by default among with other things, read more here.
If you don’t have it enabled for any other reason, then DATE/DATETIMEMAY be NULL, because data integrity > performance in this case.
MySQL 5.7 is full of new features, like virtual columns, virtual indexes and JSON fields! But, it came with some changes to the default configuration. When running:
What I want to talk about is the ONLY_FULL_GROUP_BY mode. This mode rejects queries where nonaggregated columns are expected, but aren’t on the GROUP BY or HAVING clause. Before MySQL 5.7.5, ONLY_FULL_GROUP_BY was disabled by default, now it is enabled.
You know the drill…
This is a simple statement, people use it everywhere, it shouldn’t be that hard to use, right?
Given the following schema:
Suppose I want to list all users that commented on post_id = 1, MySQL 5.6:
SELECT * FROM comments c INNER JOIN users u ON c.user_id = u.id WHERE c.post_id = 1 GROUP BY c.user_id;
And this is the result:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Same query running on 5.7.11 gives the following results:
[42000][1055] Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'blog.c.id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
What does it mean?
What MySQL is complaining about here is this: you grouped rows by c.user_id, but the problem is there are more than one result to be retrieved for the c.id column. Since you didn’t use any aggregators, as min(c.id) for instance, it doesn’t know which result to bring.
Previous versions of MySQL would solve this “magically”. This change is not MySQL being temperamental with you, it is them implementing long old industry standard specifications (SQL/92 and SQL/99) to the database. To rely on results brought in the previous versions of that query is not smart. Those results are unpredictable and totally arbitrary.
MySQL extends the standard SQL use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Result set sorting occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.
How do I fix it?
It will make your query more verbose, but it will make it right. There are two ways of doing this.
One way is using aggregators in the fields you need to retrieve and that will be grouped by the email field, for instance.
SELECT any_value(u.id) AS user_id, any_value(u.name) AS name, u.email, any_value(u.country) AS country, any_value(u.created) AS registration_date, max(c.created) AS last_comment, count(*) AS total_comments FROM comments c INNER JOIN users u ON c.user_id = u.id WHERE c.post_id = 1;
Another way is to name the fields that will be unique in the GROUP BY clause:
SELECT u.id AS user_id, u.name, u.email, u.country, u.created AS registration_date, max(c.created) AS last_comment, count(*) AS total_comments FROM comments c INNER JOIN users u ON c.user_id = u.id WHERE c.post_id = 1 GROUP BY u.email, u.id, u.name, u.country, u.created;
Result for both queries:
+---------+---------------+----------------------+---------+---------------------+---------------------+----------------+ | user_id | name | email | country | registration_date | last_comment | total_comments | +---------+---------------+----------------------+---------+---------------------+---------------------+----------------+ | 2 | Bart Simpson | bart@simpsons.com | US | 2016-03-03 20:07:28 | 2016-03-03 21:21:08 | 2 | | 1 | Lisa Simpson | lisa@simpsons.com | US | 2016-03-03 20:07:23 | 2016-03-03 21:20:50 | 2 | | 3 | Homer Simpson | nobrain@simpsons.com | US | 2016-03-03 20:07:38 | 2016-03-03 21:20:56 | 1 | +---------+---------------+----------------------+---------+---------------------+---------------------+----------------+ 3 rows in set (0.00 sec)
In another words, both queries follows SQL/92 specification:
The SQL/92 standard for GROUP BY requires the following:
A column used in an expression of the SELECT clause must be in the GROUP BY clause. Otherwise, the expression using that column is an aggregate function.
A GROUP BY expression can only contain column names from the select list, but not those used only as arguments for vector aggregates.
The results of a standard GROUP BY with vector aggregate functions produce one row with one value per group.
In the 5.7.5 version, MySQL also implemented SQL/99, which means that if such a relationship exists between name and id, the query is legal. This would be the case, for example, where you group by a primary key or foreign key:
SELECT u.id AS user_id, u.name, u.email, u.country, u.created AS registration_date, max(c.created) AS last_comment, count(*) AS total_comments FROM comments c INNER JOIN users u ON c.user_id = u.id WHERE c.post_id = 1 GROUP BY u.id;
You can read more details about how MySQL handles GROUP BY in their documentation.
TL;DR;
According to the documentation, this configuration is being enabled by default because GROUP BY processing has become more sophisticated to include detection of functional dependencies. It also brings MySQL closer to the best practices for SQL language with the bonus of removing the “magic” element when grouping. Having that, grouping fields are no longer arbitrary selected.
Disabling ONLY_FULL_GROUP_BY
If you are upgrading your database server and want to avoid any possible breaks you can disable by removing it from your sql_mode.
Changing in runtime
SET @@GLOBAL.sql_mode = 'STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
A restart is not necessary, but a reconnection is.
Change permanently
If you want to disable it permanently, add/edit the following in your my.cnf file: