分类: Mysql/postgreSQL
2009-08-05 21:32:37
mysql> create table services (sub_id int, service_name varchar (30), service_parm int, primary key (sub_id, service_name))-------------------------------------------------------------
engine = ndb partition by key (sub_id);
By default, Cluster will partition based on primary key
When adding rows to a table that’s using MySQL Cluster as the storage engine, each row is assigned to a partition where that partition is mastered by a particular data node in the Cluster. The best performance comes when all of the data required to satisfy a transaction is held within a single partition so that it can be satisfied within a single data node rather than being bounced back and forth between multiple nodes where extra latency will be introduced.
By default, Cluster partions the data by hashing the primary key. This is not always optimal.
For example, if we have 2 tables, the first using a single-column primary key (sub_id) and the second using a composite key (sub_id, service_name)…
mysql> describe names;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| sub_id | int(11) | NO | PRI | NULL | |
| name | varchar(30) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
mysql> describe services;
+--------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| sub_id | int(11) | NO | PRI | 0 | |
| service_name | varchar(30) | NO | PRI | | |
| service_parm | int(11) | YES | | NULL | |
+--------------+-------------+------+-----+---------+-------+
If we then add data to these (initially empty) tables, we can then use the ‘explain’ command to see which partitions (and hence phyical hosts) are used to store the data for this single subscriber…
mysql> insert into names values (1,'Billy');
mysql> insert into services values (1,'VoIP',20),(1,'Video',654),(1,'IM',878),(1,'ssh',666);
mysql> explain partitions select * from names where sub_id=1;
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | names | p3 | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+
mysql> explain partitions select * from services where sub_id=1;
+----+-------------+----------+-------------+------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------------+------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | services | p0,p1,p2,p3 | ref | PRIMARY | PRIMARY | 4 | const | 10 | |
+----+-------------+----------+-------------+------+---------------+---------+---------+-------+------+-------+
The service records for the same subscriber (sub_id = 1) are split accross 4 diffent partitions (p0, p1, p2 & p3). This means that the query results in messages being passed backwards and forwards between the 4 different data nodes which cnsumes extra CPU time and incurs extra latency.
We can override the default behaviour by telling Cluster which fields should be fed into the hash algorithm. For our example, it’s reasonable to expect a transaction to access multiple records for the same subscriber (identified by their sub_id) and so the application will perform best if all of the rows for that sub_id are held in the same partition…
mysql> drop table services;
mysql> create table services (sub_id int, service_name varchar (30), service_parm int, primary key (sub_id, service_name)) engine = ndb
-> partition by key (sub_id);
mysql> insert into services values (1,'VoIP',20),(1,'Video',654),(1,'IM',878),(1,'ssh',666);
mysql> explain partitions select * from services where sub_id=1;
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | services | p3 | ref | PRIMARY | PRIMARY | 4 | const | 10 | |
+----+-------------+----------+------------+------+---------------+---------+---------+-------+------+-------+
Now all of the rows for sub_id=1 from the services table are now held within a single partition (p3) which is the same as that holding the row for the same sub_id in the names table. Note that it wasn’t necessary to drop, recreate and re-provision the services table, the following command would have had the same effect:
mysql> alter table services partition by key (sub_id);