Following Restore one table from an ALL database dump and Restore a Single Table From mysqldump, I would like to add my own thoughts and comments on the subject.
I also wish to note performance issues with the two suggested solutions, and offer improvements.
Problem relevanceWhile the problem is interesting, I just want to note that it is relevant in very specific database dimensions. Too small - and it doesn't matter how you solve it (e.g. just open vi/emacs and copy+paste). Too big - and it would not be worthwhile to restore from mysqldump anyway. I would suggest that the problem is interesting in the whereabouts of a few dozen GB worth of data.
Problem recapGiven a dump file (generated by mysqldump), how do you restore a single table, without making any changes to other tables?
Let's review the two referenced solutions. I'll be using the employees db on for testing. I'll choose a very small table to restore: departments (only a few rows in this table).
Security based solutionChris offers to create a special purpose account, which will only have write (CREATE, INSERT, etc.) privileges on the particular table to restore. Cool hack! But, I'm afraid, not too efficient, for two reasons:
- MySQL needs to process all irrelevant queries (ALTER, INSERT, ...) only to disallow them due to access violation errors.
- Assuming restore is from remote host, we overload the network with all said irrelevant queries.
Just how inefficient? Let's time it:
- mysql> grant usage on *.* to 'restoreuser'@'localhost';
- mysql> grant select on *.* to 'restoreuser'@'localhost';
- mysql> grant all on employees.departments to 'restoreuser'@'localhost';
- $ time mysql --user=restoreuser --socket=/tmp/mysql_sandbox21701.sock --force employees < /tmp/employees.sql
- ...
- ERROR 1142 (42000) at line 343: INSERT command denied to user 'restoreuser'@'localhost' for table 'titles'
- ERROR 1142 (42000) at line 344: ALTER command denied to user 'restoreuser'@'localhost' for table 'titles'
- ...
- (lot
So, at about 30 seconds to restore a 9 rows table.
Text filtering based solution.gtowey offers parsing the dump file beforehand:
- First, parse with grep, to detect rows where tables are referenced within dump file
- Second, parse with sed, extracting relevant rows.
Let's time this one:
- $ time grep -n 'Table structure' /tmp/employees.sql
- 23:-- Table structure for table `departments`
- 48:-- Table structure for table `dept_emp`
- 89:-- Table structure for table `dept_manager`
- 117:-- Table structure for table `employees`
- 161:-- Table structure for table `salaries`
- 301:-- Table structure for table `titles`
- real 0m0.397s
- user 0m0.232s
- sys 0m0.164s
- $ time sed -n 23,48p /tmp/employees.sql | ./use employees
- real 0m0.562s
- user 0m0.380s
- sys 0m0.176s
Much faster: about 1 second, compared to 30 seconds from above.
Nevertheless, I find two issues here:
- A correctness problem: this solution somewhat assumes that there's only a single table with desired name. I say "somewhat" since it leaves this for the user.
- An efficiency problem: it reads the dump file twice. First parsing it with grep, then with sed.
sed is much stronger than presented. In fact, the inquiry made by grep in gtowey's solution can be easily handled by sed:
- $ time sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" /tmp/employees.sql | ./use employees
- real 0m0.573s
- user 0m0.416s
- sys 0m0.152s
So, the "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" part tells sed to only print those rows starting from the departments table structure, and ending in the next table structure (this is for clarity: had department been the last table, there would not be a next table, but we could nevertheless solve this using other anchors).
And, we only do it in 0.57 seconds: about half the time of previous attempt.
Now, just to be more correct, we only wish to consider the employees.department table. So, assuming there's more than one database dumped (and, by consequence, USE statements in the dump-file), we use:
Further notes
- $ time sed -n "/^-- Table structure for table \`departments\`/,/^-- Table structure for table/p" /tmp/employees.sql | ./use employees
- real 0m0.573s
- user 0m0.416s
- sys 0m0.152s
- All tests used warmed-up caches.
- The sharp eyed readers would notice that departments is the first table in the dump file. Would that give an unfair advantage to the parsing-based restore methods? The answer is no. I've created an xdepartments table, to be located at the end of the dump. The difference in time is neglectful and inconclusive; we're still at ~0.58-0.59 seconds. The effect will be more visible on really large dumps; but then, so would the security-based effects.