A few people have asked me how fast is. As with so many other things, it depends. This article shows how long it takes to checksum real data on a production server I help manage, which might give you a rough idea of how long it’ll take on your servers.
The server and workload
This server is a replication master running MySQL 5.0.38. It is a Dell Poweredge 1800 series with dual Xeon 3.4GHz processors and 2GB RAM, with three 15K SCSI hard drives in a RAID5 configuration. It serves about 40GB of data in InnoDB tables and about 25GB in MyISAM.
I can’t say too much about the workload, but I’ll tell you what I can. At the time I ran these checksums, it was running many rollup and LOAD DATA INFILE
queries on the tables I checksummed. These tend to do a lot of updates, deletes, and inserts. There are also several processes that REPLACE
or INSERT.. ON DUPLICATE KEY UPDATE
large parts of certain tables which are in the 2-8GB range. At the same time, there are processes running rapid-fire single-row lookups and GROUP BY
queries against all or part of these tables. And certain other larger-than-memory tables elsewhere in the server were being updated too, probably flushing the cache.
In other words, these results are under heavy load, and not scientific or repeatable at all (there was definitely heavier load on some of the test runs than others). But it gives you an idea. Your mileage may vary.
The results
The following table shows the number of seconds it took to checksum several heavily-used InnoDB tables with various checksum strategies: MySQL’s built-in CHECKSUM TABLE
, the ACCUM strategy with replication as an INSERT/SELECT
(acquires share-mode locks on the whole table for replication consistency), the ACCUM strategy as a plain SELECT
without share-mode locks, and the BIT_XOR strategy. If you wish to know more about these strategies and what they do, the MySQL Table Checksum documentation explains them in great detail.
The last column shows how long it takes to run COUNT(*)
on the tables in question. As you can see, taking a checksum is sometimes not that much more expensive than a simple COUNT()
on InnoDB.
Rows | Table Size | CHECKSUM | Replicate/ACCUM | ACCUM | BIT_XOR | COUNT |
---|---|---|---|---|---|---|
49,152 | 317 | 0 | 0 | 0 | 0 | 0 |
1,589,248 | 14,472 | 0 | 1 | 0 | 1 | 1 |
100,843,520 | 638,144 | 15 | 12 | 13 | 25 | 8 |
332,316,672 | 504,652 | 52 | 56 | 43 | 77 | 4 |
2,167,996,416 | 4,341,475 | 258 | 303 | 335 | 541 | 151 |
318,636,032 | 517,406 | 31 | 10 | 45 | 92 | 1 |
1,064,960 | 3,105 | 1 | 1 | 1 | 0 | 0 |
2,818,048 | 2,369 | 0 | 1 | 1 | 1 | 0 |
Conclusion
How fast is it? On this server, I’m getting about 2GB in roughly five minutes with the ACCUM strategy, or an average of around .44GB/min, under heavy load.
In real terms, the checksum algorithms I’ve designed are within roughly an order of magnitude of the speed of COUNT(*)
on InnoDB, and as the table gets larger, this difference decreases to about half, probably because of InnoDB’s caches. I think that’s the most interesting result from these tests.
It’s also interesting that my checksum algorithms are almost as fast as MySQL’s own CHECKSUM TABLE
command on this data. This makes sense, given that they both essentially scan the whole table, but I didn’t expect it, given that CHECKSUM TABLE
is compiled into the server.
Written by Xaprb
May 12th, 2007 at 10:21 am