Tag: mysql 5.7

How to setup a Replication User

How to setup a Replication User

 

A replication user is necessary to set up the relationship Primary/Replica. This is a short step but it needs a bit more of attention.

From the MySQL 5.7 documentation (highlights are my own):

Although you do not have to create an account specifically for replication, you should be aware that the replication user name and password are stored in plain text in the master info repository file or table (see Section 16.2.4.2, “Slave Status Logs”). Therefore, you may want to create a separate account that has privileges only for the replication process, to minimize the possibility of compromise to other accounts.

The following command specifically will allow replication from all databases and tables connecting from all hosts. For security reasons you may want to limit access to replication only to the IP address of the server doing the replication.

Log into the MySQL console using a user with GRANT privileges in the primary server and execute the following:

CREATE USER 'replication'@'%' IDENTIFIED BY 'mysupersecretpassword'
GRANT REPLICATION SLAVE ON *.* TO 'replication'@'%';

My advice is instead of using the % wildcard, set up the IP address of your replica.

This user will be added to the primary ’s MASTER_USER option, and in theory could be any user as long it also has REPLICATION SLAVE privileges. After that, the replica will connect to the primary and perform some kind of handshake with those credentials and if they match, theprimary will allow replication to occur.

See something wrong in this tutorial? Please don’t hesitate to message me through the comments or the contact page.

Understanding Generated Columns

The Theory

Generated Columns is a feature released on MySQL 5.7. They can be used during CREATE TABLE or ALTER TABLE statements. It is a way of storing data without actually sending it through the INSERT or UPDATE clauses in SQL. The database resolves what the data will be.

There are two types of Generated Columns: Virtual and Stored. They work with:

  • mathematical expressions (product_price * quantity)
  • built-in functions (RIGHT(), CONCAT(), FROM_UNIXTIME(), JSON_EXTRACT())
  • literals (“2”, “new”, 0)

Besides that, they can be indexed but they don’t allow subqueries in it.
A Generated Column works within the table domain. If you need subqueries on a particular column, you may have to look at Views.

The basic example

As an example I am going to use an e-commerce database as based on my past experience of what I have seen and worked. You will probably have at least these tables or something similar:

  • users – stores user info
  • products – stores product info like price and description
  • orders – stores the user_id, date of order
  • orders_items – stores product_id, order_id, quantity and price at the time of purchase

This is the whole DB: Gist.

Notice the order_items definition:

CREATE TABLE `orders_items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`product_price` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`quantity` int(10) unsigned NOT NULL DEFAULT 1,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` varchar(45) NOT NULL DEFAULT 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

The retrieval would bring:

SELECT
`id`, `order_id`, `product_id`, `product_price`, `quantity`
FROM `orders_items`
LIMIT 5;
id order_id product_id product_price quantity
1 369 1304 202.18 7
2 4 1736 250.40 3
3 270 1404 29.89 5
4 256 179 190.40 10
5 107 1911 146.98 1

One example is to get the total of that order_item row, something like total_item_price that would store the value of product_price * quantity to show how much the summed amount of an item would be. Some databases have the MONEY type to store price, as with MySQL it is recommended to work with DECIMAL.

People solve this problem in different ways:

  • store the calculated price on a new column to make it easier to retrieve;
  • create a view;
  • or they calculate in the application itself, which in this case might cause problems due to how the language handles floats. There are libraries to deal with money values in a lot of languages and frameworks, however, the overhead of converting each row into a money object could be costly depending on the amount of data being transferred.

Another way I’ve seen is: people calculate in the query the total amount for the orders_items row as product_price * quantity:

SELECT
`id`,
`order_id`,
`product_id`,
`product_price`,
`quantity`,
`product_price` * `quantity` AS total_item_price
FROM `orders_items`
LIMIT 5;
id order_id product_id product_price quantity total_item_price
1 369 1304 202.18 7 1415.26
2 4 1736 250.40 3 751.20
3 270 1404 29.89 5 149.45
4 256 179 190.40 10 1904.00
5 107 1911 146.98 1 146.98

Virtual Columns

  • They take no disk space, except when using a Virtual Column as in a Secondary Index.
  • They are an INPLACE operation: it means the table definition is changed without having to recopy all the data again. More info.
  • The values are calculated on the fly during read operations and BEFORE triggers.

Consider using virtual columns for data where changes happens in a significant number of times. The cost of a Virtual Column comes from reading a table constantly and the server has to compute every time what that column value will be.

Stored Columns

  • They do use disk space.
  • It has the same cost of adding a new column, so it is a COPY operation
  • Values are updated in every INSERT and UPDATE statement.

You should consider using Stored Columns for when the data doesn’t change significantly or at all after creation, like for instance, the example above with the orders_items table. Once a purchase is made, the price of the product is stored, not being changed, neither the quantity. Considering this information we could create total_item_price as a Stored Column.

The code

Creating a table

-- Virtual Column

CREATE TABLE `orders_items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`product_price` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`quantity` int(10) unsigned NOT NULL DEFAULT 1,
`total_item_price` decimal(10,2) AS (`quantity` * `product_price`),
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` varchar(45) NOT NULL DEFAULT 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

-- Stored Column

CREATE TABLE `orders_items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`product_price` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`quantity` int(10) unsigned NOT NULL DEFAULT 1,
`total_item_price` decimal(10,2) AS (`quantity` * `product_price`) STORED,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` varchar(45) NOT NULL DEFAULT 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

Notice how the definition changes on line 9 and 23: you have another keyword, AS, then an expression and specifically on line 23 you see a STORED keyword. In both lines they are generated columns, if nothing is specified will be a VIRTUAL column.

Altering a table

It uses the same syntax as adding a column, just adding the “AS (expression)” after the data type:

-- `full_name` as VIRTUAL COLUMN
ALTER TABLE users
ADD COLUMN `full_name` VARCHAR(500)
AS (CONCAT_WS(" ", `first_name`, `last_name`));

-- `total_item_price` as STORED COLUMN
ALTER TABLE orders_items
ADD COLUMN `total_item_price` DECIMAL(10, 2)
AS (`quantity` * `product_price`) STORED;

JSON fields

It is also possible to extract data from JSON fields using generated columns. As the functions for JSON are built-in, JSON_EXTRACT and JSON_UNQUOTE as well “->” and “->>” work as expressions for a generated column:

-- Stored Columns
ALTER TABLE `twitter_users`
ADD COLUMN `location` VARCHAR(255)
AS (response->>"$.location") STORED;

Final considerations

When the type is STORED, it must be specified after the expression otherwise the default behaviour will be to be VIRTUAL.

Generated columns can have indexes created as the following, no matter if stored, virtual or extracted from a JSON field:

ALTER TABLE users
ADD INDEX `ix_full_name` (`full_name`);

Which is the same syntax for normal columns.

Varchar fields on MySQL 5.7

Disclaimer: this post takes into consideration that strict mode is enabled on the server

VARCHAR  and  CHAR  are used to store strings. VARCHAR stores varying length and CHAR always use the same exact size no matter the size of the string. For example, CHAR(4) will always store 4 bytes, whereas VARCHAR(4) will store up to 5 bytes. See documentation.

When we create a table like this one:

CREATE TABLE `blog`.`users` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NOT NULL,
`email` VARCHAR(255) NOT NULL,
`created_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`));

We put inside the parentheses the length of the field in characters for the VARCHAR field. However, the maximum size in bytes of the field will depend on the CHARSET and COLLATION of the table. You can also specify a different collation for a column.

For instance:

  • latin1: 1 to 2 bytes per character.
  • utf8: 1 to 4 bytes per character.

Why this is important to know

The new Online DDL changes for VARCHAR fields are documented as follows:

The number of length bytes required by a VARCHAR column must remain the same. For VARCHAR values of 0 to 255, one length byte is required to encode the value. For VARCHAR values of 256 bytes or more, two length bytes are required. As a result, in-place ALTER TABLE only supports increasing VARCHAR size from 0 to 255 bytes or increasing VARCHAR size from a value equal to or greater than 256 bytes. In-place ALTER TABLE does not support increasing VARCHAR size from less than 256 bytes to a value equal to or greater than 256 bytes. In this case, the number of required length bytes would change from 1 to 2, which is only supported by a table copy (ALGORITHM=COPY) (…)

The section highlighted is true, however, a bit misleading, changes between VARCHAR(1) and VARCHAR(255) will only be INPLACE if you are using latin1 charset. If you are using utf8 for instance that range drops from VARCHAR(1) to VARCHAR(63). The reason behind this is because in worst case scenario that field with utf8 will count each character as 4 bytes, making VARCHAR(63) < 256 bytes  and VARCHAR(>63) >= 256 bytes.

More clarification on the Source Code for MySQL:

/*
Copy a field part into an output buffer.
SYNOPSIS
Field::get_key_image()
buff [out] output buffer
length output buffer size
type itMBR for geometry blobs, otherwise itRAW
DESCRIPTION
This function makes a copy of field part of size equal to or
less than "length" parameter value.
For fields of string types (CHAR, VARCHAR, TEXT) the rest of buffer
is padded by zero byte.
NOTES
For variable length character fields (i.e. UTF-8) the "length"
parameter means a number of output buffer bytes as if all field
characters have maximal possible size (mbmaxlen). In the other words,
"length" parameter is a number of characters multiplied by
field_charset->mbmaxlen.
RETURN
Number of copied bytes (excluding padded zero bytes -- see above).
*/
virtual size_t get_key_image(uchar *buff, size_t length, imagetype type)
{
get_image(buff, length, &my_charset_bin);
return length;
}
virtual void set_key_image(const uchar *buff, size_t length)
{ set_image(buff,length, &my_charset_bin); }
inline longlong val_int_offset(uint row_offset)
{
ptr+=row_offset;
longlong tmp=val_int();
ptr-=row_offset;
return tmp;
}
inline longlong val_int(const uchar *new_ptr)
{
uchar *old_ptr= ptr;
longlong return_value;
ptr= (uchar*) new_ptr;
return_value= val_int();
ptr= old_ptr;
return return_value;
}
inline String *val_str(String *str, const uchar *new_ptr)
{
uchar *old_ptr= ptr;
ptr= (uchar*) new_ptr;
val_str(str);
ptr= old_ptr;
return str;
}
virtual bool send_binary(Protocol *protocol);
virtual bool send_text(Protocol *protocol);
virtual uchar *pack(uchar *to, const uchar *from,
uint max_length, bool low_byte_first);

Conclusion

Online DDL changes are supported, but you must pay attention to your field size in bytes. Which it doesn’t mean it is the size inside the parentheses or the character count.

I found this while meddling with some change in size fields and I didn’t think the documentation was clear enough in the highlighted example.