By: Juan Carlos Olamendy Turruellas

Introduction

Today it's very common to store a lot of data in our MySQL databases. Common examples are: storing tracking information about your website to do analytics later, logging facts about your business, or just recording geolocation information provided by phones. As a database grows exponentially, then some queries (even the optimized one) are getting slower.

One technique used to speed up the performance when you deal with very big tables is partitioning.

The idea behind partitioning isn't to use multiple servers to host your data but to divide up the big tables/indexes into smaller segments (logical data spaces) stored in different physical data files, so that the queries access only a fraction of the data; thus there is less for them to scan, balancing the workload and potentially making queries faster.

In this article, I'll explain the concepts behind data partitioning. Later, these principles and concepts will be illustrated using real-world examples so that you can apply this technique in your own solutions.

Key Concepts

Conceptually, there are two types of partitioning: horizontal and vertical. In horizontal partitioning, a partition function is applied to all rows in a table in order to store the rows in different partition segments in the same node. In vertical partitioning, a partition function is applied to all rows in a table in order to distribute the data in different nodes in the cluster.

At the time of this writing, MySQL supports only horizontal partitioning.

The main points to consider when partitioning your data horizontally in MySQL are:

  • Multiples physical data files represent a single logical table in a single node. You can notice that there are no multiple nodes involved.
  • When a query is executed, MySQL knows in which partitions the data may reside, so the query hits only these particular partitions. Note that you don’t want your queries to hit all the partitions. This would be a big performance problem that could degrade your system; a the worst-case scenario, when partitioning would not add any value to system performance. When queries are executed, they should hit the smallest number of partitions possible (just a subset of the whole data). To achieve this goal, you can rewrite your queries to fit the actual partitioning schema or create a better partitioning schema for your data.
  • You need to define a partition key and a partition function. The partition function evaluates an expression containing the partition key in order to select the appropriate partition in which to store the row. In MySQL, all columns that are part of the partition key must present in every unique key of the table. The expression of the partition function can be any valid SQL expression. There are functions (any date function that returns an integer) that can be used in the expression such as: TO_DAYS(), YEAR(), TO_SECONDS(), WEEKDAY(), DAYOFYEAR(), MONTH() and UNIX_TIMESTAMP.
  • When partitioning the data, we cannot execute a partial operation for partitioning only a subset of the data. Partitioning a table means partitioning the data and the index; we cannot partition only the data without partitioning the index.
  • Partitions of the same table must use the same storage engine. If we partition an InnoDB table, all the partitions will be InnoDB.
  • In order to improve the performance and scalability of the system, we can balance the IO workload by distributing the partitions in different disks/volumes. We can achieve this using the DATA DIRECTORY and INDEX DIRECTORY options of the PARTITION clause.

Benefits

The key benefits of data partitioning are:

  • Improve the performance and scalability by balancing the IO workload.
  • Improve the system capability because we can store more data in one table than can be held on a single volume. That’s because the data is physically distributed.
  • Improve data management because we can delete or archive data by dealing with a particular partition without dealing with the rest of the table. For example, dropping a partition is faster than a DELETE query.

Demo

Let’s start illustrating the partitioning concepts using some real-world examples.

There are several types of partitioning in MySQL such as:

  • RANGE
  • HASH
  • LIST
  • KEY

In each section, I’ll cover each partitioning type.

Range Partitioning

In this partitioning scheme, we define several ranges (using the VALUES LESS THAN operator) and each row will be assigned to one of them. In this way, the ranges should be contiguous but they cannot overlap each other.

This scheme is very useful for representing ranges of time for tracking, logging, recording transactions, etc.

Let’s use this scheme for storing transactions related to sale orders, as shown in the listing 01.

MySQL> CREATE TABLE testing_db.sales_order (

    id int NOT NULL,

    product varchar(20) NOT NULL,

    created datetime NOT NULL,

    amount decimal(8,2),

    PRIMARY KEY(id, product, created)

)

PARTITION BY RANGE( YEAR(created) )

( PARTITION part_less_than_2015 VALUES LESS THAN (2015),

  PARTITION part_2015 VALUES LESS THAN (2016),

  PARTITION part_2016 VALUES LESS THAN (2017),

  PARTITION part_greater_than_2017 VALUES LESS THAN MAXVALUE

);

Listing 01

You can verify that MySQL has actually created a data file for each partition by listing the testing_db database directory as shown in the listing 02.

$ ls -alh /var/lib/mysql/testing_db

total 408K

drwxr-x---  2 mysql mysql 4.0K Jul  5 22:35 .

drwxr-x--- 12 mysql mysql 4.0K Jul  5 16:12 ..

-rw-r-----  1 mysql mysql   65 Jul  5 16:12 db.opt

-rw-r-----  1 mysql mysql 8.5K Jul  5 22:35 sales_order.frm

-rw-r-----  1 mysql mysql  96K Jul  5 22:35 sales_order#P#part_2015.ibd

-rw-r-----  1 mysql mysql  96K Jul  5 22:35 sales_order#P#part_2016.ibd

-rw-r-----  1 mysql mysql  96K Jul  5 22:35 sales_order#P#part_greater_than_2017.ibd

-rw-r-----  1 mysql mysql  96K Jul  5 22:35 sales_order#P#part_less_than_2015.ibd

Listing 02

Note the MAXVALUE clause used in the range definition. This is not mandatory, but it works as a catch all; in this case for orders equal to or greater than the max value (year 2017).

If we don’t include the MAXVALUE clause, and if we try to insert an order for the year 2020, then it would fail, as there wouldn’t be any partition definition for that value.

You can see the utility of the YEAR function for defining partitions. We can use any function that returns an INTEGER, such as UNIX_TIMESTAMP, as shown in listing 03.

MySQL> CREATE TABLE testing_db.sales_order (

    id int NOT NULL,

    product varchar(20) NOT NULL,

    created timestamp NOT NULL,

    amount decimal(8,2),

    PRIMARY KEY(id, product, created)

)

PARTITION BY RANGE(UNIX_TIMESTAMP (created) )

( PARTITION part_less_than_2015 VALUES LESS THAN (UNIX_TIMESTAMP('2015-01-01')),

  PARTITION part_2015 VALUES LESS THAN (UNIX_TIMESTAMP('2016-01-01')),

  PARTITION part_2016 VALUES LESS THAN (UNIX_TIMESTAMP('2017-01-01')),

  PARTITION part_greater_than_2017 VALUES LESS THAN MAXVALUE

);

Listing 03

We can query a table indicating a particular partition to be used, in order not to use the whole table, as shown in listing 04.

MySQL> SELECT *,'part_2015'

FROM testing_db.sales_order PARTITION (part_2015);

Listing 04

We can also see the number of rows per partition using the following query, shown in listing 05.

MySQL> SELECT PARTITION_NAME, TABLE_ROWS

FROM INFORMATION_SCHEMA.PARTITIONS

WHERE TABLE_NAME='sales_order';

Listing 05

We can delete the rows of a partition without affecting the rest of the dataset in the table, as shown in listing 06.

MySQL> ALTER TABLE testing_db.sales_order TRUNCATE PARTITION part_2015;

Listing 06

We can also drop a partition, as shown in listing 07.

MySQL> ALTER TABLE testing_db.sales_order DROP PARTITION part_2015;

Listing 07

Another important concept in table partitioning is data pruning. If we want to show all the orders before 2017, then MySQL optimizer will leave unnecessary partitions out of the execution plan, dramatically improving the execution time (latency).

We can check using an EXPLAIN statement to visualize the execution plan and the involved partitions (marked in bold) as shown in listing 08.

MySQL> EXPLAIN partitions SELECT * FROM testing_db.sales_order WHERE created < '2017-01-01';

+----+-------------+-------------+-------------------------------+------+---------------+------+---------+------+------+----------+-------------+

| id | select_type | table       | partitions                    | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |

+----+-------------+-------------+-------------------------------+------+---------------+------+---------+------+------+----------+-------------+ |  1 | SIMPLE      | sales_order | part_less_than_2015,part_2016 | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | Using where | +----+-------------+-------------+-------------------------------+------+---------------+------+---------+------+------+----------+-------------+

Listing 08

List Partitioning

This scheme is similar to RANGE, but in this case, a particular partition is selected based on columns matching one of a set of discrete values, using the VALUES IN clause where the value list is a comma-separated list of integers, as shown in listing 09.

MySQL> CREATE TABLE testing_db.sales_order (

    id int NOT NULL,

    product varchar(20) NOT NULL,

    region int NOT NULL,

    created datetime NOT NULL,

    amount decimal(8,2),

    PRIMARY KEY(id, region)

)

PARTITION BY LIST (region)(

    PARTITION east_south_region VALUES IN(1, 2),

    PARTITION west_region VALUES IN(3)

);

Listing 09

Notice that in this partitioning schema, there is no catch-all expression (like the MAXVALUE expression in RANGE). We should cover all possible elements in the criteria list to prevent insertion errors.

Hash Partitioning

In this scheme, a particular partition is selected based on the value returned by a user-defined expression. The expression can be any expression that yields a nonnegative integer value. This schema is used mainly to evenly distribute data among the number of partitions.

Let’s see an example in listing 10.

MySQL> CREATE TABLE testing_db.sales_order (

    id int NOT NULL,

    product varchar(20) NOT NULL,

    region int NOT NULL,

    created datetime NOT NULL,

    amount decimal(8,2),

    PRIMARY KEY(id, region)

) PARTITION BY HASH (region)

PARTITIONS 4;

Listing 10

The hash expression can also use a function, as shown in listing 11.

MySQL> CREATE TABLE testing_db.sales_order (

    id int NOT NULL,

    product varchar(20) NOT NULL,

    region int NOT NULL,

    created datetime NOT NULL,

    amount decimal(8,2),

    PRIMARY KEY(id, region, created)

) PARTITION BY HASH (MONTH(created))

PARTITIONS 12;

Listing 11

Key Partitioning

This scheme is similar to the HASH scheme, with the difference that the hashing function is provided by MySQL. MySQL employs its own internal hashing function, which is based on the same algorithm as PASSWORD.

If we don’t explicitly specify the partitioning columns part of the key, then MySQL will automatically use the primary key or a unique key as the partitioning column.

If no unique keys are available, the statement will fail.

Let’s see an example in listing 12.

MySQL> CREATE TABLE testing_db.sales_order (

    id int NOT NULL,

    product varchar(20) NOT NULL,

    region int NOT NULL,

    created datetime NOT NULL,

    amount decimal(8,2),

    PRIMARY KEY(id, region, created)

)

PARTITION BY KEY()

PARTITIONS 4;

Listing 12

Conclusion

In this article, I've explained the idea and benefits related to transportable tablespace mechanism in InnoDB as an alternative for moving large datasets between MySQL instances.

I’ve also illustrated the partitioning concepts through common real-world examples. Now you can start using this amazing feature in your own database environment to speed up your queries and scale your systems.