Day: September 8, 2016

Varchar fields on MySQL 5.7

Disclaimer: this post takes into consideration that strict mode is enabled on the server

VARCHAR  and  CHAR  are used to store strings. VARCHAR stores varying length and CHAR always use the same exact size no matter the size of the string. For example, CHAR(4) will always store 4 bytes, whereas VARCHAR(4) will store up to 5 bytes. See documentation.

When we create a table like this one:

CREATE TABLE `blog`.`users` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NOT NULL,
`email` VARCHAR(255) NOT NULL,
`created_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`));

We put inside the parentheses the length of the field in characters for the VARCHAR field. However, the maximum size in bytes of the field will depend on the CHARSET and COLLATION of the table. You can also specify a different collation for a column.

For instance:

  • latin1: 1 to 2 bytes per character.
  • utf8: 1 to 4 bytes per character.

Why this is important to know

The new Online DDL changes for VARCHAR fields are documented as follows:

The number of length bytes required by a VARCHAR column must remain the same. For VARCHAR values of 0 to 255, one length byte is required to encode the value. For VARCHAR values of 256 bytes or more, two length bytes are required. As a result, in-place ALTER TABLE only supports increasing VARCHAR size from 0 to 255 bytes or increasing VARCHAR size from a value equal to or greater than 256 bytes. In-place ALTER TABLE does not support increasing VARCHAR size from less than 256 bytes to a value equal to or greater than 256 bytes. In this case, the number of required length bytes would change from 1 to 2, which is only supported by a table copy (ALGORITHM=COPY) (…)

The section highlighted is true, however, a bit misleading, changes between VARCHAR(1) and VARCHAR(255) will only be INPLACE if you are using latin1 charset. If you are using utf8 for instance that range drops from VARCHAR(1) to VARCHAR(63). The reason behind this is because in worst case scenario that field with utf8 will count each character as 4 bytes, making VARCHAR(63) < 256 bytes  and VARCHAR(>63) >= 256 bytes.

More clarification on the Source Code for MySQL:

/*
Copy a field part into an output buffer.
SYNOPSIS
Field::get_key_image()
buff [out] output buffer
length output buffer size
type itMBR for geometry blobs, otherwise itRAW
DESCRIPTION
This function makes a copy of field part of size equal to or
less than "length" parameter value.
For fields of string types (CHAR, VARCHAR, TEXT) the rest of buffer
is padded by zero byte.
NOTES
For variable length character fields (i.e. UTF-8) the "length"
parameter means a number of output buffer bytes as if all field
characters have maximal possible size (mbmaxlen). In the other words,
"length" parameter is a number of characters multiplied by
field_charset->mbmaxlen.
RETURN
Number of copied bytes (excluding padded zero bytes -- see above).
*/
virtual size_t get_key_image(uchar *buff, size_t length, imagetype type)
{
get_image(buff, length, &my_charset_bin);
return length;
}
virtual void set_key_image(const uchar *buff, size_t length)
{ set_image(buff,length, &my_charset_bin); }
inline longlong val_int_offset(uint row_offset)
{
ptr+=row_offset;
longlong tmp=val_int();
ptr-=row_offset;
return tmp;
}
inline longlong val_int(const uchar *new_ptr)
{
uchar *old_ptr= ptr;
longlong return_value;
ptr= (uchar*) new_ptr;
return_value= val_int();
ptr= old_ptr;
return return_value;
}
inline String *val_str(String *str, const uchar *new_ptr)
{
uchar *old_ptr= ptr;
ptr= (uchar*) new_ptr;
val_str(str);
ptr= old_ptr;
return str;
}
virtual bool send_binary(Protocol *protocol);
virtual bool send_text(Protocol *protocol);
virtual uchar *pack(uchar *to, const uchar *from,
uint max_length, bool low_byte_first);

Conclusion

Online DDL changes are supported, but you must pay attention to your field size in bytes. Which it doesn’t mean it is the size inside the parentheses or the character count.

I found this while meddling with some change in size fields and I didn’t think the documentation was clear enough in the highlighted example.