Jan
20

MySQL Datatypes

MySQL No Comments

Choosing the wrong data type to store your data can have a dramatic effect on performance. This post will cover the general guidelines for choosing the correct data type along with summaries of the various data types available in MySQL.

General

  • Smaller is better (usually) – smaller data types take up less space in memory and on disk. Choose the smallest data type you don’t think you’ll exceed.
  • Strings are slow – character data types require extra processing to handle character sets and collations, use the built in data types for dates/times and store IP addresses as integers.
  • Avoid NULL – nullable columns require extra processing and use more storage space; not to mention storing NULL values is bad database design in 9/10 cases.

Integers

MySQL has five integer data types ranging from 8 to 64 bits, all of which can be signed or unsigned. The “width” setting is only used for formatting purposes by the MySQL interactive tools (the command-line client for example). It does not restrict the range of allowed values and has no bearing on how the values are stored or processed during computation; INT(1) is identical to INT(20).

Type Bytes Minimum Value Maximum Value
(Signed/Unsigned) (Signed/Unsigned)
TINYINT 1 -128 127
0 255
SMALLINT 2 -32,768 32,767
0 65,535
MEDIUMINT 3 -8,388,608 8,388,607
0 16,777,215
INT 4 -2,147,483,648 2,147,483,647
0 4,294,967,295
BIGINT 8 -9,223,372,036,854,775,808 9,223,372,036,854,775,807
0 18,446,744,073,709,551,615

Real Numbers

MySQL has three datatypes for storing real numbers. FLOAT and DOUBLE are used to represent approximate numeric values and DECIMAL is used to store exact numeric values.

DECIMAL should be used in cases where it is important to preserve exact precision, for example with monetary data, as you can specify the number of digits allowed before and after the decimal point. The precision choosen will affect the space required to store the value with digits being packed into binary strings (nine digits per four bytes).

Type Bytes Minimum Value Maximum Value
FLOAT 4 ±1.175494351E-38 ±3.402823466E+38
DOUBLE 8 ±2.2250738585072014E-308 ±1.7976931348623157E+308
DECIMAL Varies The maximum range of DECIMAL values is the same as for DOUBLE, but the actual range for a given DECIMAL column is constrained by the choice of precision.

Strings

The two main string data types are VARCHAR and CHAR.

VARCHAR columns store variable length strings and can require less storage space than CHAR columns as they only use as much space as they need. VARCHAR uses 1 or 2 extra bytes to record the value’s length. VARCHAR columns generally help performance as they use less space, however, extra work is needed if an UPDATE causes a row to grow and no longer fit in its original location.

VARCHAR is generally worth using when the maximum column length is much greater than the average length, or when updates are rare. CHAR columns are used for fixed length fields such as MD5 hashes, very short values such as Y or N and for data that’s changed frequently.

The above defines how the values are stored on disk, not necessarily in memory. MySQL often allocates fixed-size chunks of memory to hold values internally, which may lead to performance issues if being over-generous with the size of VARCHAR columns.

Type Bytes Maximum Length
CHAR L 255
VARCHAR L + 1/2 65,535

BLOB and TEXT

BLOB and TEXT data types are designed to store large amount of binary and character data respectively. There are four sub-types for each determining how much data each type can hold. When sorting BLOB and TEXT columns, MySQL sorts only the first max_sort_length bytes of the column. In addition, MySQL can’t index the full length of these columns and can’t use the indexes for sorting.

Type Bytes Maximum Length
TINYBLOB / TINYTEXT L + 1 255
BLOB / TEXT L + 2 65,535
MEDIUMBLOB / MEDIUMTEXT L + 3 16,777,215
LONGBLOB / LONGTEXT L + 4 4,294,967,295

Dates and Times

MySQL has five datatypes for storing temporal values, DATETIME, TIMESTAMP, DATE, TIME and YEAR. The last three (DATE, TIME and YEAR) should be sufficiently self explanatory in regards of what types of value they store. DATETIME and TIMESTAMP are used to store both date and time values in a single column, and are very similar.

DATETIME columns can hold a large range of values, are independent of timezone and are packed into an integer in YYYYMMDDHHMMSS format.

TIMESTAMP columns store the number of seconds elasped since the UNIX epoch (Midnight, 1st January 1970), the same as a UNIX timestamp. The values displayed also depend on the current time zone settings. TIMESTAMP columns also have special behavior when inserting and updating rows.

Type Bytes Minimum Value Maximum Value
TIMESTAMP 4 1970-01-01 00:00:01 2038-01-09 03:14:07
DATETIME 8 1000-01-01 00:00:00 9999-12-31 23:59:59
DATE 3 1000-01-01 9999-12-31
TIME 3 -838:59:59 838:59:59
YEAR 1 1901 2155

Enums

Enums columns are typically used for string values where the range of permitted values is quite small, e.g. state, gender, status, etc. Each Enum column can store up to 65.535 distinct values, with each value being stored as an integer representing its position in the field definition list. An Enum column therefore requires one or two bytes depending on the number of items in the list. Enum columns are sorted using the integer values and not the actual string values.

Nov
5

MySQL Storage Engines

MySQL 2 Comments

Storage engines are responsible for storing and retrieving all the data stored “in” MySQL. Each storage engine has its own strengths and weaknesses that determine its suitability in a particular situation.

MyISAM

MyISAM is the default storage engine in MySQL and provides a good comprimise between performance and features.

Pros:

  • platform independent
  • concurrent inserts
  • full-text indexes
  • compression

Cons:

  • no transations
  • table-level locking
  • potentially long repair times

Good For:

  • applications with many reads and few writes.

InnoDB

InnoDB is a transactional storage engine that uses MVCC and row-level locking, and includes automatic crash recovery.

Pros:

  • ACID transactions
  • row-level locking and MVCC
  • crash recovery
  • clustered indexes
  • foreign key constraints

Cons:

  • alterations to table structure can be slow on large tables

Good For:

  • online ordering and other transaction based applications.

Memory

Memory tables store all their data in memory which means they are very fast because there is no waiting for disk I/O. They also use hash indexes which makes them very fast for lookup queries. The table definition of a Memory table will survive a server restart, but all data will be lost.

Pros:

  • very fast

Cons:

  • uses fixed-length rows which can waste memory
  • table-level locking
  • no support for TEXT or BLOB datatypes
  • no transactions

Used for:

  • lookup or mapping tables
  • caching results of periodically aggregated data
  • intermediate results when analysing data

Archive

The archive engine is optimised for high-speed inserting and data compression. It supports only INSERT and SELECT queries and doesn’t support DELETE, REPLACE, or UPDATE queries, or indexes. Rows are buffered and compressed using zlib as they are inserted which means much less disk I/O than MyISAM tables.

Pros:

  • fast INSERTs
  • compression

Cons:

  • no support for indexes, SELECTs will perform a full table scan.

Uses:

  • storing large amounts of rarely accessed data in a very small footprint, e.g. logs and audit records

CSV

The CSV storage engine stores data in text files using comma-seperated values. Other applications can open the table data file directly and read the contents. Likewise, if an application exports a CSV and saves it in the server’s data directory, the server can read the file straight away. CSV tables do not support indexes.

Uses:

  • data interchange and certain types of logging.

Other Engines

There are several other storage engines available.

  • Blackhole – Essentially a no-op storage engine, all INSERTS are discarded although they are recorded in the binary log and can be replayed on slaves.
  • Federated – Federated tables refer to tables on a remote MySQL server.
  • NDB Cluster – A specialised storage engine designed for high-performance with redundancy and load-balancing capabilities.
  • Falcon – A next-generation storage engine designed for todays hardware (64bit CPUs and plenty of memory).
  • Maria – A replacement for MyISAM that includes transactions, row-level locking, MVCC, and better crash recovery.

Blackhole, Federated and NDB Cluster are suitable only for specific purposes and should only be used after careful consideration. Falcon and Maria are the two modern storage engines although neither are currently considered production-stable.