Jan
20

MySQL Datatypes

MySQL No Comments

Choosing the wrong data type to store your data can have a dramatic effect on performance. This post will cover the general guidelines for choosing the correct data type along with summaries of the various data types available in MySQL.

General

  • Smaller is better (usually) – smaller data types take up less space in memory and on disk. Choose the smallest data type you don’t think you’ll exceed.
  • Strings are slow – character data types require extra processing to handle character sets and collations, use the built in data types for dates/times and store IP addresses as integers.
  • Avoid NULL – nullable columns require extra processing and use more storage space; not to mention storing NULL values is bad database design in 9/10 cases.

Integers

MySQL has five integer data types ranging from 8 to 64 bits, all of which can be signed or unsigned. The “width” setting is only used for formatting purposes by the MySQL interactive tools (the command-line client for example). It does not restrict the range of allowed values and has no bearing on how the values are stored or processed during computation; INT(1) is identical to INT(20).

Type Bytes Minimum Value Maximum Value
(Signed/Unsigned) (Signed/Unsigned)
TINYINT 1 -128 127
0 255
SMALLINT 2 -32,768 32,767
0 65,535
MEDIUMINT 3 -8,388,608 8,388,607
0 16,777,215
INT 4 -2,147,483,648 2,147,483,647
0 4,294,967,295
BIGINT 8 -9,223,372,036,854,775,808 9,223,372,036,854,775,807
0 18,446,744,073,709,551,615

Real Numbers

MySQL has three datatypes for storing real numbers. FLOAT and DOUBLE are used to represent approximate numeric values and DECIMAL is used to store exact numeric values.

DECIMAL should be used in cases where it is important to preserve exact precision, for example with monetary data, as you can specify the number of digits allowed before and after the decimal point. The precision choosen will affect the space required to store the value with digits being packed into binary strings (nine digits per four bytes).

Type Bytes Minimum Value Maximum Value
FLOAT 4 ±1.175494351E-38 ±3.402823466E+38
DOUBLE 8 ±2.2250738585072014E-308 ±1.7976931348623157E+308
DECIMAL Varies The maximum range of DECIMAL values is the same as for DOUBLE, but the actual range for a given DECIMAL column is constrained by the choice of precision.

Strings

The two main string data types are VARCHAR and CHAR.

VARCHAR columns store variable length strings and can require less storage space than CHAR columns as they only use as much space as they need. VARCHAR uses 1 or 2 extra bytes to record the value’s length. VARCHAR columns generally help performance as they use less space, however, extra work is needed if an UPDATE causes a row to grow and no longer fit in its original location.

VARCHAR is generally worth using when the maximum column length is much greater than the average length, or when updates are rare. CHAR columns are used for fixed length fields such as MD5 hashes, very short values such as Y or N and for data that’s changed frequently.

The above defines how the values are stored on disk, not necessarily in memory. MySQL often allocates fixed-size chunks of memory to hold values internally, which may lead to performance issues if being over-generous with the size of VARCHAR columns.

Type Bytes Maximum Length
CHAR L 255
VARCHAR L + 1/2 65,535

BLOB and TEXT

BLOB and TEXT data types are designed to store large amount of binary and character data respectively. There are four sub-types for each determining how much data each type can hold. When sorting BLOB and TEXT columns, MySQL sorts only the first max_sort_length bytes of the column. In addition, MySQL can’t index the full length of these columns and can’t use the indexes for sorting.

Type Bytes Maximum Length
TINYBLOB / TINYTEXT L + 1 255
BLOB / TEXT L + 2 65,535
MEDIUMBLOB / MEDIUMTEXT L + 3 16,777,215
LONGBLOB / LONGTEXT L + 4 4,294,967,295

Dates and Times

MySQL has five datatypes for storing temporal values, DATETIME, TIMESTAMP, DATE, TIME and YEAR. The last three (DATE, TIME and YEAR) should be sufficiently self explanatory in regards of what types of value they store. DATETIME and TIMESTAMP are used to store both date and time values in a single column, and are very similar.

DATETIME columns can hold a large range of values, are independent of timezone and are packed into an integer in YYYYMMDDHHMMSS format.

TIMESTAMP columns store the number of seconds elasped since the UNIX epoch (Midnight, 1st January 1970), the same as a UNIX timestamp. The values displayed also depend on the current time zone settings. TIMESTAMP columns also have special behavior when inserting and updating rows.

Type Bytes Minimum Value Maximum Value
TIMESTAMP 4 1970-01-01 00:00:01 2038-01-09 03:14:07
DATETIME 8 1000-01-01 00:00:00 9999-12-31 23:59:59
DATE 3 1000-01-01 9999-12-31
TIME 3 -838:59:59 838:59:59
YEAR 1 1901 2155

Enums

Enums columns are typically used for string values where the range of permitted values is quite small, e.g. state, gender, status, etc. Each Enum column can store up to 65.535 distinct values, with each value being stored as an integer representing its position in the field definition list. An Enum column therefore requires one or two bytes depending on the number of items in the list. Enum columns are sorted using the integer values and not the actual string values.