Benchmarking NDB vs Galera
Inspired by the benchmark in this post, we decided to run some NDB vs Galera benchmarks for ourselves.
We confirmed that NDB does not perform well using m1.large instances. In fact, it’s totally unacceptable - no setup should ever have a minimum latency of 220ms - so m1.large instances are not an option. Apparently the instances get CPU bound, but CPU utilization never goes above ~50%. Maybe top/vmstat can’t be trusted in this virtualized environment?
So, why not use m1.xlarge instances? This sounds like a better plan!
As in the original post, our dataset is 15 tables of 2M rows each, created with:
./sysbench --test=tests/db/oltp.lua --oltp-tables-count=15 --oltp-table-size=2000000 --mysql-table-engine=ndbcluster --mysql-user=user --mysql-host=host1 prepare
Benchmark against NDB was executed with:
for i in 8 16 32 64 128 256
do
./sysbench --report-interval=30 --test=tests/db/oltp.lua --oltp-tables-count=15 --oltp-table-size=2000000 --rand-init=on --oltp-read-only=off --rand-type=uniform --max-requests=0 --mysql-user=user --mysql-port=3306 --mysql-host=host1,host2 --mysql-table-engine=ndbcluster --max-time=600 --num-threads=$i run > ndb_2_nodes_$i.txt
done
After we shutdown NDB, we started Galera and recreated the table, but found that running sysbench was failing. A suggestion from Hingo was to use --oltp-auto-inc=off, which worked.
Our benchmark against NDB was executed with:
for i in 8 16 32 64 128 256
do
./sysbench --report-interval=30 --test=tests/db/oltp.lua --oltp-tables-count=15 --oltp-table-size=2000000 --rand-init=on --oltp-read-only=off --rand-type=uniform --max-requests=0 --mysql-user=user --mysql-port=3306 --mysql-host=host1,host2 --mysql-table-engine=ndbcluster --max-time=600 --num-threads=$i --oltp-auto-inc=off run > galera_2_nodes_$i.txt
done
Below are the graphs of average throughput at the end of 10 minutes, and 95% response time.




Galera clearly performs better than NDB with 2 instances!
But things become very interesting when we graph the reports generated every 10 seconds.


Surprised, right? What is that?
Here we see that even if the workload fits completely in the buffer pool, the high number of TPS causes aggressive flushing.
We assume the benchmark in the Galera blog post was CPU bound, while in our benchmark the behavior is I/O bound.
We then added another 2 more nodes (m1.xlarge instances), but kept the dataset at 15 tables x 2M rows , and re-ran the benchmark with NDB and Galera. Performance on Galera gets stuck, due to I/O. Actually, with Galera, we found that performance on 4 nodes was worse than with 2 nodes; we assume this is caused by the fact that the whole cluster goes at the speed of the slower node.
Performance on NDB keeps growing as new nodes are added, so we added another 2 nodes for just NDB (6 nodes total).


The graphs show that NDB scales better than Galera, which is not what we expected to find.
It is perhaps unfair to say that NDB scales better than Galera, but rather that NDB checkpoint causes less stress on I/O than InnoDB checkpoint, thus the bottleneck is on InnoDB and not Galera itself. To be more precise, the bottleneck is on slow I/O.
The follow graph shows the performance with 512 threads and 4 nodes (NDB and Galera) or 6 nodes (only NDB). Data collected every 30 seconds.

Archives
- May 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- November 2009
- March 2008
- November 2007
- October 2007


Comments
From these data, we can see that even if the workload fits completely in the buffer pool, the high number of TPS causes aggressive flushing. Also, the charts show that NDB scales better than Galera, which is not expected. Auto Parts
Andy,
that won't work.
From http://dev.mysql.com/doc/refman/5.5/en/partitioning-limitations-storage-engines.html :
Partitioning by KEY (including LINEAR KEY) is the only type of partitioning supported for the NDBCLUSTER storage engine. It is not possible in MySQL Cluster NDB 7.2 to create a MySQL Cluster table using any partitioning type other than [LINEAR] KEY, and attempting to do so fails with an error.
my.cnf used for Galera : http://pastebin.com/raw.php?i=p2Ax108D
my.cnf used for NDB : http://pastebin.com/raw.php?i=e61muuYG
config.ini (for 2 nodes) : http://pastebin.com/raw.php?i=u7pkAdrS
Rene,Great stuff! Thanks for sharing this.I am curious, could you tell us:- few more details on the configuration of InnoDB/Galera and NDB- your opinion re the net latency, how it affects the benchmarks Thanks again for sharing!-ivan
Did you use (PARTITION BY RANGE) for NDB?As pointed out here (http://openlife.cc/blogs/2012/march/comments-codership-galera-vs-ndb-clo...) the default NDB partition scheme (by hashing on key) isn't a good fit for sysbench.
Hi ReneThis is a very good benchmark you have done. For me it was always well known that NDB is very sensitive to latency, including - apparently - CPU scheduling latency. But it is always a bit of a surprise to see how poor this is in the cloud. At the same time it is always a surprise to see how well Galera performs under poor network latency, even for clustering over continents.Now, when you say that you didn't expect NDB to scale better, this is of course a matter of viewpoint. You use 4-6 NDB nodes to match the performance of 2 Galera nodes. But it is true that NDB scales when you add more nodes whereas Galera didn't, and there is a very natural explanation: NDB does sharding and Galera does not. When you add more NDB nodes, the write load is distributed over more shards, and you get more performance. With Galera all writes still go to all nodes, so situation stays the same (or becomes worse if you get a weaker node).For Galera scale-out, the following are true:
The last graph is quite typical and I've seen similar behavior on disk bound workloads myself. (But your graph is nicer :-) This is the behavior you get from InnoDB when you become heavily disk-bound. However, Galera adds its own "signature" to this graph. When InnoDB becomes stuck, then Galera slave appliers are blocked - you can see this with SHOW PROCESSLIST. Committed transactions fill the Galera slave queue, and flow-control kicks in. At this point you cannot commit anything on any node in the cluster before queues are emptied again. This is by design and is the opposite of slave lag - Galera is a tightly coupled cluster, so when any one slave has an issue, everyone has to wait for it. It is my guess that the regularity in the graph you see comes from Galera flow control - InnoDB itself tends to produce the same behavior but much more irregular.Finally, it would be nice to know about you my.cnf. Did you try with larger InnoDB buffer pool? In my tests it helped a lot - but that was bare metal. Also knowing values of wsrep_slave_threads and innodb_flush_log_at_trx_commit are interesting (neither should be 1 :-)
Hi Rene, thanks for giving it a ride. Could you post the my.cnf you used for Galera benchmark?
Very Nice Post !!!Krishna
Reply