mydumper & myloader : fast backup and restore

At PalominoDB we do not normally use mysqldump for regular backups, but only in some circumstances (for example MySQL upgrade).
Lately we gave a try to mydumper as an alternative to mysqldump, and results are quite promising.
We found that mydumper performs very fast exporting both small and large datasets!!
We also found that the with large datasets restore with myloader doesn't perform a lot better than simple restore from mysqldump SQL dump: this depends from the storage engine and not from the client used to restore.

On a box we run 2 tests:
1) with a dataset that was fitting in the InnoDB buffer pool;
2) with a dataset larger than the InnoDB buffer pool.

TEST #1

We created 128 tables of 1M rows each, for a total dataset of 31GB on disk:
shell$ time ./sysbench --test=tests/db/parallel_prepare.lua --oltp-tables-count=128 --oltp-table-size=1000000 --mysql-table-engine=innodb --mysql-user=root --num-threads=12 run real 22m0.013s
user 204m22.054s
sys 0m37.430s

Doing the backup with mydumper:
shell$ time ./mydumper -t 8 -B sbtest
real 0m29.807s
user 2m35.111s
sys 0m26.102s

... and with mysqldump:
shell$ time mysqldump --single-transaction sbtest > sbtest.sql
real 6m24.607s
user 5m19.355s
sys 0m46.761s

Within this test, mydumper looks around 13 times faster than mysqldump.

We also tried compression, but I/O was fast enough to make compression only an unnecessary overhead: in other words, on that hardware and with this dataset, mydumper with compression was slower than mydumper without compression.

To complete the test, we tried recovery time, after deleting and recreating and empty database:
shell$ mysql -e "drop database sbtest ; create database sbtest"
shell$ time ./myloader -t 8 -d export-20110720-090954
real 9m12.548s
user 0m55.193s
sys 0m28.316s

shell$ mysql -e "drop database sbtest ; create database sbtest"
shell$ time ( echo "SET SQL_LOG_BIN=0;" ; cat sbtest.sql ) | mysql sbtest
real 46m46.140s
user 9m3.604s
sys 0m48.256s

With this dataset, restore time using myloader was 5 times faster than using the SQL dump from mysqldump.

TEST #2

Test #2 is very similar to test #1 , but with some different in the dataset:
48 tables instead of 128 tables;
10M rows on each table instead of 1M rows;
a total dataset on disk of 114GB instead of 31GB.

First, we created the tables with sysbench:
shell$ time ./sysbench --test=tests/db/parallel_prepare.lua --oltp-tables-count=48 --oltp-table-size=10000000 --mysql-table-engine=innodb --mysql-user=root --num-threads=12 run
real 107m24.657s
user 689m2.852s
sys 2m11.980s

Backup with mydumper:
shell$ time ./mydumper -t 8 -B sbtest
real 7m42.703s
user 15m14.873s
sys 2m20.203s

The size of the backup is quite big because not compressed: 91GB
On average, mydumper was writing on disk at around 200MB/s.

Backup with mysqldump:
shell$ time mysqldump --single-transaction sbtest > sbtest.sql
real 32m53.972s
user 20m29.853s
sys 2m47.674s

mydumper was again faster than mysqldump , but not as much as in the previous test: only 4 times faster.

Was now the time to measure recovery time:
shell$ mysql -e "drop database sbtest ; create database sbtest"
shell$ time ./myloader -t 6 -d export-20110720-171706
real 130m58.403s
user 4m5.209s
sys 1m51.801s

shell$ mysql -e "drop database sbtest ; create database sbtest"
shell$ time ( echo "SET SQL_LOG_BIN=0;" ; cat sbtest.sql ) | mysql sbtest
real 204m18.121s
user 34m33.520s
sys 3m43.826s

myloader is just a bit more than 50% times faster than importing the SQL dump from mysqdump

Conclusion from second test:
a) With larger dataset, mydumper slows down because the system does more I/O as the dataset doesn't fit in memory, but still way faster than mysqldump.
b) With larger dataset, load time with myloader slowed down a lot. Although, the root cause of the performance drop isn't mydumper , but:
- more I/O (dataset + dump don't fit in RAM);
- InnoDB inserts rate degenerates with bigger tables.

On the topic of InnoDB inserts rate degeneration with big tables , probably another blog post will follow.

Notes on hardware and configuration:
CPU: 2 x 6cores with HT enabled
96 GB of RAM
FusionIO

innodb_buffer_pool_size=64G
innodb_log_file_size=2047M
innodb_io_capacity=4000
innodb_flush_log_at_trx_commit=2
(binlog disabled)

Comments

Nice post. I have been testing MyDumper and it's import with and without MyLoader.The main difference in your MySQL parameters and mine would be the "innodb_buffer_pool_size" because I have it set to 6G.You can read about my testings here:http://nlightme.com/?p=224http://nlightme.com/?p=144

Fri, 10/11/2013 - 11:31

Also, look at the simple technique to restore from mysqldump dump files at http://www.geeksww.com/tutorials/database_management_systems/mysql/tips_...you do not need to install or learn any special software 

Fri, 06/29/2012 - 05:23

Happy New Year!

Had some time to do a quick test of myloader queries per transaction values and seems any value > 1000 was slightly faster.


mysql vb387mysql < mysqldumpvb387.sql
real: 80.79s cpu: 9% maxmem: 22272 KB cswaits: 16011


myloader -t 2 -d export-20120102-050257 -B vb387mydumper
real: 53.11s cpu: 3% maxmem: 21360 KB cswaits: 11832

myloader -t 2 -d export-20120102-050257 -B vb387mydumper2k -q 2000
real: 49.59s cpu: 4% maxmem: 21360 KB cswaits: 11939

myloader -t 2 -d export-20120102-050257 -B vb387mydumper3k -q 3000
real: 51.83s cpu: 3% maxmem: 21360 KB cswaits: 11986

myloader -t 2 -d export-20120102-050257 -B vb387mydumper4k -q 4000
real: 51.11s cpu: 4% maxmem: 21344 KB cswaits: 11947

myloader -t 2 -d export-20120102-050257 -B vb387mydumper5k -q 5000
real: 48.52s cpu: 4% maxmem: 21360 KB cswaits: 12128

myloader -t 2 -d export-20120102-050257 -B vb387mydumper6k -q 6000
real: 47.43s cpu: 4% maxmem: 21344 KB cswaits: 11945

Sun, 01/01/2012 - 15:30

@LinuxJedi: you and George really made me curious about --queries-per-transaction , I need to find the time for more testing.

@Stewart: I will try to find the time for this too, but I think the result depends from the size of the dataset.
If the dataset fits completely (or close) in the InnoDB bufferpool , mydumper will transfer data from RAM to disk : this should be way faster than copying files from disk to disk.
By contrast, if the InnoDB buffer pool is way smaller than the dataset, copying files with xtrabackup should be faster.

To make things more interesting, I think it depends also from the ratio data/index: xtrabackup copies data and index, mydumper dumps only data.

Rene Cannao
Thu, 08/04/2011 - 07:40

Interesting timing results. I'd love to see a comparison with xtrabackup in there too. I'm guessing it may even be possible for mydumper to beat xtrabackup in backup times in certain situations (although unlikely on restore for those situations).

Wed, 08/03/2011 - 19:28

Hi Rene,

Thanks for the benchmarking. I have had some fantastic feedback over the last few weeks and I'll try and make many improvements over the next couple of months.

I'll also have a play with various things and see if I can find a way of improving myloader's performance. --queries-per-transaction default may not be the best yet, it was more 30seconds of guesswork at time of coding than anything else.

Wed, 08/03/2011 - 12:47

Thanks Rene, looking forward to more tests/results :D

Wed, 08/03/2011 - 08:12

Bret, the tool by maatkit was mk-parallel-dump, but it is now defined deprecated and to be avoided in production environment.

Rene Cannao
Wed, 08/03/2011 - 04:49

Hiya.Wasnt there a tool by maatkit that did the same work and / or functionality? I actually think it became depreciated.Brent

Brent
Wed, 08/03/2011 - 04:38

George,

I wanted to evenly divide the number of tables. Being 48 tables, possible values smaller or equal the number of cores were 4, 6, 8 and 12.
During the dump I tried 8 and 12: the result was almost identical, and I published only one of them.

Also during the load there was almost no difference between 6 and 8 and I published one of them, but I didn't try 12 (didn't want to overload InnoDB).
The reason is that as tables grow bigger and bigger, performance decrease because the bottleneck becomes InnoDB that spends time in updating and maintaining indexes.
With 10M rows tables the difference between 1 connection (using mysql to import the SQL dump) and myloader with 6 threads drops to only 50%, while with 1M rows tables myloader is 5 times faster. Bigger the tables become, smaller is the difference between standard mysql restore and myloader.

I didn't try changing --queries-per-transaction , but I will give it a try.

Rene Cannao
Tue, 08/02/2011 - 14:02

Rene thanks for the test results. Just curious for a 12 cpu core server, what was specific reason for choosing 8 threads for mydumper for 1st test's back ups and recovery test but only 6 threads for 2nd test's recovery tests ?

./myloader -t 6 -d export-20110720-171706

Why not 8 threads for myloader for 2nd test's recovery test ? And how much faster is using 8 threads for the 2nd test's recovery tests ?

Would raising the --queries-per-transaction value in myloader from default 1000 help ?

Number of INSERT queries to execute per transaction during restore, default is 1000

thanks

George

Tue, 08/02/2011 - 13:25

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.
Website by Digital Loom