[ZT]分形索引 TokuDB for Mysql-tclwp-ChinaUnix博客

Posted by Vadim

(Note: Review was done as part of our consulting practice, but is totally independent and fully reflects our opinion)

I had a chance to take look TokuDB (the name of the Tokutek storage engine), and run some benchmarks. Tuning of TokuDB is much easier than InnoDB, there only few parameters to change, and actually out-of-box things running pretty well.

There are some rumors circulating that TokuDB is ”.. only an in memory or read-only engine, and that’s why inserts are so fast”. This is not actually the case, as TokuDB is a disk-based, read-write transactional storage engine that is based on special “fractal tree indexes”. Fractal Trees are a drop-in-replacement for a B-tree (based on current research in data structures by professors at Stony Brook, Rutgers, and MIT). I can’t say exactly how it is improved, because the engine itself is closed source.

Along with its “fractal tree indexes”, TokuDB also uses compression, which significantly ( Graph 1) reduces dataset and decreases the amount of IO operations. The benefit of small size is also that TokuDB can keep in memory much more records then InnoDB / MyISAM. Actually in internal cache records are stored in uncompressed form, but OS Cache can keep compressed pages in memory. For the data set we tested, TokuDB used 6.2x less disk space than InnoDB, and 5.5x less disk space than MyISAM.

For tests I used Dell PowerEdge R900, with RAID 10 on 8 disks (2.5″ SAS disks, 15K RPMS) and 32GB of physical RAM, but restricted on kernel level to 4GB to emulate case B-Tree does not fit into memory.
As benchmark software I tried iiBench, which you can take there

What makes fractal indexes so interesting is the amount of IO operations to update index tree is significantly less than for usual B-Tree index. It’s as if Fractal Trees turn random IO into sequential IO. This is why you see the results that you do in iiBench test ( Graph 2), and the number of inserts/sec is almost linear even when table size bigger than available memory. For the last 10M rows inserted, InnoDB averaged 1,555 rows/sec while TokuDB averaged 16,437 rows/sec - about 10.6x faster. One consequence of having such fast indexes, is that you can maintain a richer set of indexes at a given incoming data rate, enabling much higher query performance.

Beside iiBench we run benchmarks of SELECT queries again one of our click analyzing schema, in two modes - 1. data size is much more then memory and 2. data fits into memory.

As you see in IO-bound case TokuDB outperforms InnoDB 1.4-2.5x times, but CPU-bound is not so good. I think there we meet one of current restrictions of TokuDB - SERIALIZABLE isolation level for transactions.

Speaking about restrictions, the current problems I see are:
- Transactions only support the SERIALIZABLE isolation level. Beside it TokuDB does not scale well on multi-cores even in only SELECT queries. What this practically means it that you can’t get benefit of multi-core boxes running concurrent threads. Tokutek plans to fix this in one of the next releases.
- We did not tested wide range of queries, but by design expect there may be not good results for some kind of queries, i.e. point select queries, as in this case TokuDB has to read and decompress big portion of data.
- Despite Inserts and Deletes are fast, updates are not expected to show the same performance gain, as to update we need to read data, and in this case - read previous comment.
- The version we tested did not yet support recovery logs. The code for it is ready, and will be available in a release soon.
- The ways to do a backup is mysqldump/mysqlhotcopy. It is not fully transparent backup, as it applied TABLE LOCK on copying table. When recovery logs are supported, I guess it will be possible to run LVM backup. Actually I would say backup is only partially the problem of storage engine. The biggest problem is that MySQL does not yet provide an interface for that. This is going to be fixed in MySQL 6.0, but I can’t yet say how it will work with mix of storage engines.
- The Tokutek engine I tested comes in binary form and mysqld binary does not contain InnoDB. Tokutek tells me that InnoDB will be included in a future release.

With all the given advantages and drawbacks, I see a good practical usage of TokuDB for log analyzing and log reporting queries. By log analyzing I mean any kind of log producing application, it can be from simple apache logs put into mysql, application performance logs to more complex log like clicks, user movements and actions on site, visits tracking etc. While it may sound like an easy and trivial task, it is not at all. The more logs there are, the more space they take, and we have had setups where logs are 80% of total database size. Also there is the problem of being able to run custom reporting queries on logs. To do this, you often need many, often complex indexes which gives us the problem of random IO, waste of RAM memory and slow inserts. This is where I think Tokutek appears to be positioned to do quite good at. There are operation issues which make things more complex and, probably, I would not put yet TokuDB on customer production boxes, but it may good fit to non-critical slave where you can run analyzing queries.

April 28, 2009

Detailed review of Tokutek storage engine