Benchmarking Postgres on AWS 4,000 PIOPs EBS instances

Introduction

Disk I/O is frequently the performance bottleneck with relational databases. With AWS recently releasing 4,000 PIOPs EBS volumes, I wanted to do some benchmarking with pgbench and PostgreSQL 9.2. Prior to this release the maximum available I/O capacity was 2,000 IOPs per volume. EBS IOPs are read and written in 16Kb chunks with their performance limited by both the I/O capacity of the EBS volumes and the network bandwidth between an EC2 instance and the EBS network. My goal isn't to provide a PostgreSQL tuning guide, an EC2 tuning guide, or a database deathmatch complete with graphs; I'll just be displaying what kind of performance is available out-of-the-box without substantive tuning. In other words, this is an exploratory benchmark not a comparative benchmark. I would have liked to compare the performance of 4,000 PIOPs EBS volumes with 2,000 PIOPs EBS volumes, but I ran out of time so that will have to wait for a following post.

Setup

Region

I conducted my testing in AWS' São Paulo region. One benefit of testing in sa-east-1 is that spot prices for larger instances are (anecdotally) more stable than in us-east. Unfortunately, sa-east-1 doesn't have any cluster compute (CC) instances available. CC instances have twice the bandwidth to the EBS network than non-CC EC2 instances. That additional bandwidth allows you to construct larger software RAID volumes. My cocktail napkin calculations show that it should be possible to reach 50,000 PIOPs on an EBS-backed CC instance without much of a problem.

EC2 instances

I tested with three EC2 instances: an m1.large from which to run pgbench, an m2.2xlarge with four EBS volumes, and an m1.xlarge with one EBS volume. All EBS volumes are 400GB with 4,000 provisioned IOPs. The m1.large instance was an on-demand instance; the other instances  — the pgbench target database servers — were all spot instances with a maximum bid of $0.05. (In one case our first spot instance was terminated, and we had to rebuild it). Some brief testing showed that having an external machine driving the benchmark was critical for the best results.

Operating System

All EC2 instances are running Ubuntu 12.10. A custom sysctl.conf tuned the Sys V shared memory as well as set swappiness to zero and memory overcommit to two.

kernel.shmmax = 13355443200
kernel.shmall = 13355443200
vm.swappiness = 0
vm.overcommit_memory = 2

Packages

The following packages were installed via apt-get:

  • htop
  • xfsprogs
  • debian-keyring
  • mdadm
  • postgresql-9.2
  • postgresql-contrib-9.2

In order to install the postgresql packages a pgdb.list file containing

deb http://apt.postgresql.org/pub/repos/apt/ squeeze-pgdg main

was placed in /etc/apt/sources.list.d and the following commands were run:

gpg --keyserver pgp.mit.edu --recv-keys ACCC4CF8
gpg --armor --export ACCC4CF8 | apt-key add -
apt-get update

RAID and Filesystems

For the one volume instance, I simply created an XFS file system and mounted it on /mnt/benchmark.

mkdir /mnt/benchmark
mkfs.xfs /dev/svdf 
mount -t xfs /dev/svdf /mnt/benchmark
echo "/dev/svdf    /mnt/benchmark    xfs    defaults    1 2" >> /etc/fstab

For the four volume instance it was only slightly more involved. mkfs.xfs analyzes the underlying disk objects and determines the appropriate values for stride and width. Below are the commands for assembling a four volume mdadm software RAID array that is mounted on boot (assuming you've attached the EBS volumes to your EC2 instance). Running dpkg-reconfigure rebuilds the initrd image.

mkdir /mnt/benchmark
mdadm --create /dev/md0 --level=0 --raid-volumes=4 /dev/svdf /dev/svdg /dev/svdh /dev/svdi
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
mkfs.xfs /dev/md0
echo "/dev/md0    /mnt/benchmark    xfs    defaults    1 2" >> /etc/fstab
dpkg-reconfigure mdadm

Benchmarking

pgbench is a utlity included in the postgresql-contrib-9.2 package. It approximates the TPC-B benchmark and can be looked at as a database stress test whose output is measured in transactions per second. It involves a significant amount of disk I/O with transactions that run for relatively short amounts of time. vacuumdb was run before each pgbench iteration. For each database server pgbench was run mimicking 16 clients, 32 clients, 48 clients, 64 clients, 80 clients, and 96 clients. At each of those client values, pgbench iterated ten times in steps of 100 from 100 to 1,000 transactions per client. It's important to realize that pgbench's stress test is not typical of a web application workload; most consumer facing web applications could achieve much higher rates than those mentioned here. The only pgbench results against AWS/EBS volumes that I'm-aware-of/is-quickly-googleable is from early 2012 and, at its best, achieves rates 50% slower than the lowest rates found here. I drove the benchmark using a very small, very unfancy bash script. An example of the pgbench commandline would be:

pgbench -h $DBHOST -j4 -r -Mextended -n -c48 -t600 -U$DBUSER

m1.xlarge with single 4,000 PIOPs volume

The maximum transaction volume for this isntance was when running below 48 concurrent clients and under 500 transactions per client. While the transaction throuput never dropped precipitously at any point, loads outside of that range exhibited varying performance. Even at its worst, though, this instance handled between 600-700 transactions/second.

m2.2xlarge with four 4,000 PIOPs volumes

I was impressed; at no point did the benchmark stress this instance — the tps rate was between 1700-1900 in most situations with peaks up to 2200 transactions per second. If I was asked to blindly size a "big" PostgreSQL database server running on AWS this is probably where I would start. It's not so large that you have operational issues like worrying about MTBFs for ten volume RAID arrays or trying to snapshot 4TB of disk space, but it is large enough to absorb a substantial amount of traffic.

Graphs and Tabular Data

single-4K-volume tps

The spread of transactions/second irrespective of number of clients.

Box plot of transactions per second. Single 4K volume

Data grouped by number of concurrent clients with each bar representing an increase in 100 transactions per second ranging from 100 to 1,000.

Bar graph of transactions per second grouped by concurrent clients. Single 4K volume

Progression of tps by individual level of concurrency. The x-axis tick marks measure single pgbench runs from 100 transactions per client to 1,000 transactions per client.

Six subgraphs of transactions per second by each level of concurrency. Single 4K volume

Raw tabular data

txns/client1002003004005006007008009001000
clients
1614551283118365311975336311009923648
321500124212327577476301067665688709
482818648997051029749736593766641
6494412817041010739596778662820612
808158931055809597801684708736663
96939889774772798682725662776708

four-4,000-PIOPs-volumes tps

Again, a box plot of the data with a y-axis of transactions/second.

Box plot of transactions per second. Four 4,000 PIOPs volumes

Grouped by number of concurrent clients between 100 and 1,000 transactions per client.

Bar graph of transactions per second grouped by concurrent clients. Four 4,000 PIOPs volumes

TPS by number of concurrent clients. The x-axis ticks mark pgbench runs progressing from 100 transactions per client to 1,000 transactions per client.

Six subgraphs of transactions per second by each level of concurrency. Four 4,000 PIOPs volumes

Tabular data m2.2xlarge with four 4,000 PIOPs EBS volumes

txns/client1002003004005006007008009001000
clients
161487161718771415138818821897177112671785
321804208321601791125919972230150117171918
481810215212961951211717751709180318171847
641810158015682056181117841849190919421658
801802204414672142164518961933174018211851
961595140320471731178318591708189617511801

Comments

Well, my >2 years old laptop with i5-2410M and SSDSA2CW160G3 has pgbench scores comparable to m2.2xlarge with 4000 PIOPS, which costs ~$1200 a month.

Anonymous
Wed, 07/10/2013 - 09:39

The random page cost was changed for this test to 1. Seq and random page costs are not relative to the amount of iops, are just constants used for the planner.Basically what you need is to tell postgres to consider the random seeks and sequential scans with the same cost. If you want to set sequential scans with a lower cost, you'll also need to decrease the random page cost at the same value.   

Emanuel Calvo
Tue, 05/28/2013 - 06:29

Jay,Did you modify sequential page cost at all? Given the I/O I would think it should be less than the default.. 

Fri, 05/10/2013 - 11:24

Hi, Josh, 

Here's the postgresql.conf from the 4 volume test. 

 


data_directory = '/mnt/benchmark/data'
hba_file = '/etc/postgresql/9.2/main/pg_hba.conf'         
ident_file = '/etc/postgresql/9.2/main/pg_ident.conf'     
external_pid_file = '/var/run/postgresql/9.2-main.pid'    
listen_addresses = '*'                                    
port = 5432                                               
max_connections = 250                                     
unix_socket_directory = '/var/run/postgresql'             
ssl = true                                                
ssl_cert_file = '/etc/ssl/certs/ssl-cert-snakeoil.pem'    
ssl_key_file = '/etc/ssl/private/ssl-cert-snakeoil.key'   
shared_buffers = 8GB                                      
work_mem = 4MB                                            
maintenance_work_mem = 1500MB                             
effective_io_concurrency = 4                              
checkpoint_segments = 30                                  
checkpoint_timeout = 45min                                
checkpoint_completion_target = 0.9                        
effective_cache_size = 22000MB
log_line_prefix = '%t '                                   
log_timezone = 'UTC'
datestyle = 'iso, mdy'
timezone = 'UTC'
lc_messages = 'en_US.UTF-8'                               
lc_monetary = 'en_US.UTF-8'                               
lc_numeric = 'en_US.UTF-8'                                
lc_time = 'en_US.UTF-8'                                   
default_text_search_config = 'pg_catalog.english'
Jay Edwards
Thu, 05/09/2013 - 14:42

Emmanuel,How did you tune PostgreSQL for this?  You don't say anything about PostgreSQL.conf settings.

Wed, 05/08/2013 - 20:23

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.
Website by Digital Loom