[ZT]InnoDB insert performance-tclwp-ChinaUnix博客

Wednesday, December 10, 2008

InnoDB insert performance

Stop reading now if you don't like handwaving.

I have taken the Tokutek challenge and run iibench. This test matches behavior that occasionally matters to me: reloading all tables of a database via INSERT statements, alter table requires a copy of the table to be made and all indexes to be rebuilt. The iibench workload has one table with a primary key index and two secondary indexes. 1B rows are inserted in primary key order. The other indexed columns have random values.

How fast should this run on InnoDB?

This analysis assumes that the table is large enough so that all data does not fit in the buffer cache. On the typical DBMS that updates rows in place and implements a NO FORCE and STEAL policy for the buffer manager, each insert is likely to require 1 read and 1 write per index. The read is done for the leaf block that must be updated and the write is done to flush a dirty page from the buffer cache to get space for the read block.

The primary key leaf blocks should not require reads as the inserts are done in primary key order. The writes for the row should be amortized over many inserts for both index-organized and heap tables.

Each insert needs 4 IOs once the table is much larger than the buffer cache. If I run with a server that can do 1000 IOPs, then I expect to insert 250 rows/second.

Alas, Heikki is clever. InnoDB has an insert buffer. IO for secondary index maintenance after UPDATE and INSERT statements is deferred when leaf blocks are not in the buffer cache. Instead, a change record is written to the insert buffer in (index-id, key) order. This clusters changes for the same leaf blocks. Changes are applied from the insert buffer by a background thread.

On real servers that I watch in production this provides a 4:1 or 8:1 reduction in IO for secondary index maintenance. But what impact does it have on iibench? If the insert buffer reduces IO by 8:1, then the row insert rate should be 8X faster -- 2000 rows/second rather than 250.

The insert buffer works when it is large enough to buffer multiple changes to an index leaf block. We can construct a scenario where the insert buffer does not help, but it works for the iibench case. After 1B rows have been inserted into the iibench test table, the secondary indexes use ~6M pages and each index entry requires ~50 bytes. The insert buffer can use half of the InnoDB buffer pool which is several hundred thousand pages for a buffer pool that is close to 10GB. From output in SHOW INNODB STATUS, about 50% of the space allocated for the insert buffer was used for entries. The result of this is that the insert buffer can store ~50M entries when the buffer pool is near 10GB for the iibench test. When there are 50M insert buffer entries for 6M index pages, there will be multiple entries per leaf page so the insert buffer should help performance. I want to use (# insert buffer entries / # index pages) as the estimate for the IO reduction rate but tests I have run suggest that there are other factors.

The test ran for 382431 seconds (6373 minutes or 106 hours and 14 minutes or 4.42 days). The results are close to what my handwaving suggests can be achieved. The row insert rate near the end of this test was ~1800/second. The results are from a server with:

8 CPU cores
16GB RAM
10 disks using SW RAID 0
InnoDB with the patch to use multiple background IO threads
innodb_read_io_threads=4, innodb_write_io_threads=4 (a Google patch)
innodb_log_files_in_group=3, innodb_log_file_size=1300M
innodb_flush_log_at_trx_commit=2
innodb_io_capacity (another Google patch)
innodb_doublewrite=0
innodb_buffer_pool_size=12G
innodb_max_dirty_pages_pct=90

10 comments:

Bradley C. Kuszmaul said...: Wow!; December 10, 2008 7:44 PM
Bradley C. Kuszmaul said...: This post has been removed by the author.; December 10, 2008 7:44 PM
Anonymous said...: Which Google patches were used, and why were they used instead of stock innodb?; December 11, 2008 4:26 PM
Mark Callaghan said...: I used features that provide the my.cnf parameters: innodb_io_capacity, innodb_read_io_threads and innodb_write_io_threads. These help InnoDB use a servers IO capacity when it has more than 1 disk. Without these patches, servers run with the equivalent of innodb_io_capacity=100, innodb_read_io_threads=1 and innodb_write_io_threads=1.; December 11, 2008 8:33 PM
Mikael Ronstrom said...: You didn't a mention what innodb_io_capacity number you used. Have you seen this parameter change performance and if so where (we've seen some positive impact of it using DBT2 but nothing remarkable).; December 12, 2008 3:49 AM
Mark Callaghan said...: innodb_io_capacity=1000. The relevant InnoDB parameters that I used are listed above, except the value for that one was missing.

One way to see the impact of this is to use a large buffer cache, make most of the pages dirty and then let the server be idle. The pending IO (dirty page flushes, insert buffer page flushes, purges of deleted rows) is done much faster when the value of innodb_io_capacity matches the servers capacity and is >> 100. Setting innodb_{write,read}_io_threads should also be done in this case.

A more complex case where this helps is when the server does a lot of IO over a long period of time and accumulates a lot of pending IO. Pending IO is dirty page writes, rows to purge and a large insert buffer. For many benchmarks, you might accumulate many dirty pages. But you might not get a large insert buffer or rows to purge.

So you need a long-running test case that gets the server into a steady state and with a larger value for innodb_io_capacity the server can handle more pending IO without falling over.

InnoDB falls over when the background IO code cannot keep up with a workload. When the server falls over, transactions might stall waiting for page flushes to complete. In the worst case, you have to remove all load from the server to let the pending IO be reduced.

More details to come in another post. But you can see some of the problems for yourself by running the iibench test. Eventually, the insert buffer uses half of the buffer pool.; December 12, 2008 6:58 AM
Anonymous said...: What's a good way to judge where we should be these values? We currently use 6 15krpm SAS Raptors in RAID 10.; December 12, 2008 1:08 PM
Mark Callaghan said...: Experiment or call in the experts from MySQL, Proven Scaling and Percona. I am just starting to try these out. 6 fast SCSI disk should be able to to ~1000 IOPs, so you can try out the patch using OurDelta or Percona builds. I think they have the innodb_io_capacity feature. A lot more work remains to be done. For example, InnoDB has 256 slots for prefetch reads and dirty page writes. Requests then block waiting for an empty slot when there are none.; December 12, 2008 1:22 PM
said...: Mark,

Couple of interesting notes here

1) The order of the data for other indexes is also very important. Sometiems you would see corelations (like timestamp vs ID) or just the key cardinality is low (like only 50 states) which can change a lot of math and increase buffer pool hits a lot.

2) Server IO capacity depends a lot on concurrency, if you're doing inserts single thread for example you can batch writes in parallel in background but reads will be serialized. Surely there is read-ahead but my experience it is rarely triggered for truly random accesses.

Other thing... do you have multiple insert buffer merge threads in Innodb ?; December 14, 2008 12:37 PM
Mark Callaghan said...: Peter,
I will answer in another blog post as there are a lot of details. The read prefetch thread(s) should be useful for insert buffer merges, but they are not really useful today.; December 14, 2008 12:44 PM

High Availability MySQL

Wednesday, December 10, 2008

InnoDB insert performance

10 comments: