Understanding Generated Columns

The Theory

Generated Columns is a feature released on MySQL 5.7. They can be used during CREATE TABLE or ALTER TABLE statements. It is a way of storing data without actually sending it through the INSERT or UPDATE clauses in SQL. The database resolves what the data will be.

There are two types of Generated Columns: Virtual and Stored. They work with:

  • mathematical expressions (product_price * quantity)
  • built-in functions (RIGHT(), CONCAT(), FROM_UNIXTIME(), JSON_EXTRACT())
  • literals (“2”, “new”, 0)

Besides that, they can be indexed but they don’t allow subqueries in it.
A Generated Column works within the table domain. If you need subqueries on a particular column, you may have to look at Views.

The basic example

As an example I am going to use an e-commerce database as based on my past experience of what I have seen and worked. You will probably have at least these tables or something similar:

  • users – stores user info
  • products – stores product info like price and description
  • orders – stores the user_id, date of order
  • orders_items – stores product_id, order_id, quantity and price at the time of purchase

This is the whole DB: Gist.

Notice the order_items definition:

CREATE TABLE `orders_items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`product_price` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`quantity` int(10) unsigned NOT NULL DEFAULT 1,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` varchar(45) NOT NULL DEFAULT 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

The retrieval would bring:

SELECT
`id`, `order_id`, `product_id`, `product_price`, `quantity`
FROM `orders_items`
LIMIT 5;
id order_id product_id product_price quantity
1 369 1304 202.18 7
2 4 1736 250.40 3
3 270 1404 29.89 5
4 256 179 190.40 10
5 107 1911 146.98 1

One example is to get the total of that order_item row, something like total_item_price that would store the value of product_price * quantity to show how much the summed amount of an item would be. Some databases have the MONEY type to store price, as with MySQL it is recommended to work with DECIMAL.

People solve this problem in different ways:

  • store the calculated price on a new column to make it easier to retrieve;
  • create a view;
  • or they calculate in the application itself, which in this case might cause problems due to how the language handles floats. There are libraries to deal with money values in a lot of languages and frameworks, however, the overhead of converting each row into a money object could be costly depending on the amount of data being transferred.

Another way I’ve seen is: people calculate in the query the total amount for the orders_items row as product_price * quantity:

SELECT
`id`,
`order_id`,
`product_id`,
`product_price`,
`quantity`,
`product_price` * `quantity` AS total_item_price
FROM `orders_items`
LIMIT 5;
id order_id product_id product_price quantity total_item_price
1 369 1304 202.18 7 1415.26
2 4 1736 250.40 3 751.20
3 270 1404 29.89 5 149.45
4 256 179 190.40 10 1904.00
5 107 1911 146.98 1 146.98

Virtual Columns

  • They take no disk space, except when using a Virtual Column as in a Secondary Index.
  • They are an INPLACE operation: it means the table definition is changed without having to recopy all the data again. More info.
  • The values are calculated on the fly during read operations and BEFORE triggers.

Consider using virtual columns for data where changes happens in a significant number of times. The cost of a Virtual Column comes from reading a table constantly and the server has to compute every time what that column value will be.

Stored Columns

  • They do use disk space.
  • It has the same cost of adding a new column, so it is a COPY operation
  • Values are updated in every INSERT and UPDATE statement.

You should consider using Stored Columns for when the data doesn’t change significantly or at all after creation, like for instance, the example above with the orders_items table. Once a purchase is made, the price of the product is stored, not being changed, neither the quantity. Considering this information we could create total_item_price as a Stored Column.

The code

Creating a table

[code lang=SQL]
— Virtual Column

CREATE TABLE `orders_items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`product_price` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`quantity` int(10) unsigned NOT NULL DEFAULT 1,
`total_item_price` decimal(10,2) AS (`quantity` * `product_price`),
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` varchar(45) NOT NULL DEFAULT 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;

— Stored Column

CREATE TABLE `orders_items` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` int(10) unsigned NOT NULL,
`product_id` int(10) unsigned NOT NULL,
`product_price` decimal(10,2) unsigned NOT NULL DEFAULT '0.00',
`quantity` int(10) unsigned NOT NULL DEFAULT 1,
`total_item_price` decimal(10,2) AS (`quantity` * `product_price`) STORED,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` varchar(45) NOT NULL DEFAULT 'CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4;
[/code]

Notice how the definition changes on line 9 and 23: you have another keyword, AS, then an expression and specifically on line 23 you see a STORED keyword. In both lines they are generated columns, if nothing is specified will be a VIRTUAL column.

Altering a table

It uses the same syntax as adding a column, just adding the “AS (expression)” after the data type:

[code lang=SQL]
— `full_name` as VIRTUAL COLUMN
ALTER TABLE users
ADD COLUMN `full_name` VARCHAR(500)
AS (CONCAT_WS(" ", `first_name`, `last_name`));

— `total_item_price` as STORED COLUMN
ALTER TABLE orders_items
ADD COLUMN `total_item_price` DECIMAL(10, 2)
AS (`quantity` * `product_price`) STORED;
[/code]

JSON fields

It is also possible to extract data from JSON fields using generated columns. As the functions for JSON are built-in, JSON_EXTRACT and JSON_UNQUOTE as well “->” and “->>” work as expressions for a generated column:

[code lang=SQL]
— Stored Columns
ALTER TABLE `twitter_users`
ADD COLUMN `location` VARCHAR(255)
AS (response->>"$.location") STORED;
[/code]

Final considerations

When the type is STORED, it must be specified after the expression otherwise the default behaviour will be to be VIRTUAL.

Generated columns can have indexes created as the following, no matter if stored, virtual or extracted from a JSON field:

[code lang=SQL]
ALTER TABLE users
ADD INDEX `ix_full_name` (`full_name`);
[/code]

Which is the same syntax for normal columns.

8 thoughts on “Understanding Generated Columns

  1. Thanks for writing this 🙂

    Might be useful to clarify that you have to put the expression after the AS in parentheses, otherwise MySQL will give you a syntax error with no further clues as to what’s going on. I was scratching my head for a few minutes on that one.

  2. Hello Gabriela
    I hope you still monitor this page. Very good format and explanations. i dabble in mysql, MariaDB, and laravel, and have a problem you might help me solve.
    I’m trying to either set a column definition to compute a value, or compute the value on insert with sql. I need to count values in another column (color) that are equal to the color value being inserted (say, yellow), and then increment the value in a “colorcount” column. So, for example, if I’m inserting a row with color = yellow, and there are 20 previous rows of yellow, my colorcount column for the new row should be 21. Am I making any sense?
    (It’s really just a workaround for not being able to generate row numbers for subqueries in Laravel, which would be optimal.)
    I’ve been trying something like this and getting various errors:
    INSERT INTO draws VALUES (null, ‘yellow’, (SELECT (count(‘color’) WHERE ‘color’=’yellow’)+1);
    That or something very similar worked for me a long time ago in different DB tables.
    Any hints you might provide would be greatly appreciated. Thanks
    Nate

  3. Thank you for the good overview. My company is about to upgrade to 5.7.x and I’m just trying to get a sense of the landscape of features that I might be able to leverage. I started looking into virtual columns related to the new JSON features. This is interesting stuff; though, I am not quite sure how I will be able to leverage it yet. Right now, the thing that feels most compelling is being able to add a virtual expression to a secondary index (imagine something like storing the “email domain” based on the user’s email address). For “stored” values, I am not quite sure why I wouldn’t just add the column and move the computation to the application layer. I suppose that’s 6-of-one, half-a-dozen of the other? Anyway, good writeup.

Leave a Reply