At PalominoDB, we are not just another remote DBA; we are an integral part of your webops, DBA and BI teams.

Blog

Postgresql 9.1 - Part 1: General Features

General scope

 
Postgresql 9.1 runs over the theme “features, innovation and extensibility” and it really does. This version was born to overcome Postgresql 9.0 ‘s limitations and known bugs in replication. If you are developing over 9.0, it’s time to think seriously about preparing your code for Postgresql 9.1.
 
The intent of this series of posts are not to be another release features posts. I offer a vision based on my personal experience and focus on the features that I saw exciting for the most of the projects where I’m involved. If you want to read an excellent general article about the new features of this version, web to [2].
 
At the moment of this post, the last PostgreSQL version is 9.1.1 . It includes 11 commits to fix GIST memory leaks, VACUUM improvements, catalog fixes and others. A description of the minor release can be check at [3].
 
The main features included are:

  • Synchronous Replication
  • Foreign Data Support
  • Per Column collation support
  • True SSI (Serializable Snapshot Isolation)
  • Unlogged tables for ephemeral data
  • Writable Common Table Expressions
  • K-nearest-neighbor added to GIST indexes
  • Se-linux integration with the SECURITY LEVEL command
  • Update the PL/Python server-side language
  • To come: PGXN Client for install extensions easily from the command line. More information: http://pgxn.org/faq/  The source will be on https://github.com/pgxn/pgxn-client



Some of these features could be considered minor, but many think they are very cool while using 9.1  in their environments.

Considerations before migrating


If you are an old Pg user, you may already know the migration risks listed on the next page. Still, I advise that you note and carefully learn about these risks. Many users freeze their developments to older versions simply because they didn’t know how to solve new issues. The most notable case is when 8.3 stopped using implicit casts for some datatypes and many queries didn’t work as a result.  



There are some important changes that could affect your queries, so take a pen and note:

  • The default value of standard_conforming_strings is now turned on by default. That means that backslashes are normal characters (which is the SQL standard behavior). So, if you have backslashes in your SQL code, you must add E’’ strings. For example: E’Don’t’
  • Function-style and attribute-style data type casts were disallowed for composite types. If you have code like value_composite.text or text(value_composite), you will need to use CAST or :: operator.
  • Whereas before the checks were skipped, domains are now based on arrays when they are updated, which results in a rechecking of the constraints.
  • String_to_array function returns now an empty array for a zero-length string (before it returned NULL). The same function splits into characters if you use the NULL separator.
  • The inclusion of the INSTEAD OF action for triggers will require you to recheck the logic of your triggers.
  • If you are an actual 9.0 replication user, you may know that in 9.1 you can control the side effects of VACUUM operations during big queries execution and replication. This is a really important improvement. Basically, if you run a big query in the slave server and the master starts a VACUUM process, the slave server can request the master postpone the cleanup of death rows that are being used by the query.


Brief description of main features


Don’t worry about the details, we’ll cover each feature in future posts.

  • Synchronous Replication
    • This feature enhances the durability of the data. Only one server can be synchronous with the master, the rest of the replicated servers will be asynchronous. If the actual synchronous server goes down, another server will become synchronous (using a list of servers in synchronous_standby_names).  Failover is not automatic, so you must use external tools to activate the standby sync server, one of the most popular is pgpool [4].
  • Foreign Data Support
    • The feature of Foreign Data Wrappers has been included since 8.4, but now it is possible to reach data from any database where a plugin exists. Included in the contribs, is a file called file_fwd, which connects CSV files to a linked table. Basically it provides an interface to connect to external data. In my opinion, this is perhaps one of the most useful features of this versions, especially if you have different data sources in your environment.
  • Serializable Snapshot Isolation
    • This new level of serialization is the strictest. Postgres now supports READ COMMITED, REPEATABLE READ (old serializable) and SERIALIZABLE. It uses predicate locking to keep the lock if the write would have an impact on the result. You will not need explicit locks to use this level, due to the automatic protection provided.
  • Unlogged tables
    • Postgres uses the WAL log to have a log of all the data changes to prevent data loss and guarantee consistency in the event of a crash, but it consumes resources and sometimes we have data that we can recover from other sources or that is ephemeral. In these cases, creation of unlogged tables allows the database to have tables without logging into the WAL, reducing the writes to disk. Otherwise, this data will not be replicated, due to the mechanism of replication used by Postgres (through WAL records shipping).
  • Writable Common Table Expressions
    • CTE was included in 8.4 version, but in this version, it was improved to allow you to use writes inside the CTE (WITH clause). This could save a lot of code in your functions.
  • K-nearest-neighbor added to GIST indexes
    • Postgres supports multiple types of indexes; one of them is GiST (Generalized Search Tree). With 9.1, we can define a ‘distance’ for datatypes and use it for with a GiST index. Right now, this feature is implemented for point, pg_trgm contrib and others btree_gist datatypes. The operator for distance is <-> . Another feature you will enjoy is that LIKE and ILIKE operators can use the tgrm index without scanning the whole table.
  • SE-Linux integration
    • Postgres is now the first database to be fully integrated with military security-grade. SECURITY LABEL applies a security label to a database object. This facility is intended to allow integration with label-based mandatory access control (MAC) systems such as SE-Linux instead of the more traditional access control - discretionary with users and groups. (DAC).



References:

[1] http://www.postgresql.org/docs/9.1/static/release-9-1.html
[2] http://wiki.postgresql.org/wiki/What%27s_new_in_PostgreSQL_9.1
[3] http://www.postgresql.org/docs/9.1/static/release-9-1-1.html
[4] http://pgpool.projects.postgresql.org/


What Community Resources should be at Oracle OpenWorld?

A short while ago I posted about the Oracle OpenWorld Schedule Matrix of MySQL sessions (in PDF and HTML formats).  We have printed up a (small) number of schedules to have on hand at the MySQL Community kiosk at the User Group Pavillion in Moscone West.

Yes, you read that correctly -- the User Group Pavillion will include a MySQL Community kiosk this year!  Sarah and I have been coordinating the effort to staff the kiosk and figure out what we need to provide.

Sadly, it's just a kiosk (same as all the other User Group organizations get), so we cannot have a ton of flyers there.  To that end, we have created a QR code that resolves to www.kimtag.com/MySQL, which is where we are putting many links.  

To that end, we'd like your help figuring out what we have missed.  In order to keep the list of links as short and relevant as possible we have put as many aggregate links as we could, for example we link to planet.mysql.com instead of individual blogs, and we are only listing the major conferences with over 500 attendees expected.  The links at www.kimtag.com/MySQL as of the time of this blog writing are:

- MySQL sessions at OOW

- Planet MySQL

- dev.mysql.com (docs, etc)

- mysql.com

- MySQL User Groups (forge.mysql.com list) - so if you have a user group, make sure to update the forge page!

- Percona Live 2012 Conference & Expo

- MySQL videos on YouTube

- IOUG MySQL Council

- OurSQL Podcast Blog

- OurSQL iTunes link

- MySQL Experts podcast

- Book: MySQL Administrator's bible*

- Book: High Performance MySQL

- Book: Expert PHP/MySQL

- Book: MySQL High Availability

 

If you think of a link we should put on there, please comment below.

 

For what it's worth, the paper we will have will be:

- The current day's schedule

- A flyer about Percona Live 2012 MySQL Conference & Expo

- A poster of the QR code and a few small paper slips with the QR code

- IOUG MySQL Council business cards

And even that is stretching it, as there will be a laptop at the kiosk provided by Oracle and the kiosk is 24 inches x 24 inches, about 61 centimeters x 61 centimeters.
* Note that I have ordered the books with the MySQL Administrator's Bible first because it's for beginner/intermediate users, whereas High Performance MySQL is for intermediate/advanced users.

Videos from OSCon Data and OSCon 2011

There are 28 videos, all linked below, on the OSCon and OSCon data 2011 playlist that I have put online for free (with permission from the presenters and O'Reilly).  O'Reilly videos are available from the conference proceedings website.  Probably the best way to find all the videos in one place is to search for the 'oscon' tag on YouTube.

How do I choose what talks to film?  Well, to make it easiest on me, I choose what room to film, and then all I have to do is change the tapes every session.  This minimizes (but not completely eliminates) techical issues.  For OSCon Data it was simple - there were 5 rooms, O'Reilly was professionally recording one room, another room was "Products and Services", which left 3 rooms -- and I had 3 video cameras*.
Following is the list of videos I took, in alphabetical order.  Each link takes you to the YouTube page, which shows the presenters, description, and links to the slides (if available) and official O'Reilly Conference page:


* If there had been no technical difficulties whatsoever, there would be 38 videos on this list - 2 from OSCon (which are on the list) and 36 from OSCon data.  Unfortunately, 10 videos did not come out - either I missed the tape change, the audio could not be heard, or permission was not given by the presenters.  Note that in the latter case, presenters just never responded -- I did not have one presenter withhold permisson, though a few have not responded to my request for permission.

Beware: Default charset for mysqldump is utf8, regardless of server default charset

 

I ran into this issue a while ago, and was reminded of it again recently.  mysqldump uses a default charset of utf8, even when the default charset of the server is set differently.  Why does this matter?

The problem exists more in the fact that if you have string data that is in latin1 format, you are allowed to put in non-Latin characters. This can lead to lost data, especially when upgrading a major series (e.g. 5.0 to 5.1 or 5.1 to 5.5), because you're supposed to export and import the data.

Also, when importing a backup of an InnoDB table, if there is an error with one of the parts of the INSERT, the whole INSERT statement rolls back.  I have experienced major data loss because the garbled characters cause an error when INSERTed, and it causes perfectly fine data *not* to import because they're in the same INSERT statement as the garbled data.

For example:

First, set variables such on a MySQL server (5.0 or 5.1, I haven't tested on 5.5):

mysql> show global variables like '%char%';

+--------------------------+----------------------------+

| Variable_name            | Value                      |

+--------------------------+----------------------------+

| character_set_client     | latin1                     | 

| character_set_connection | latin1                     | 

| character_set_database   | latin1                     | 

| character_set_filesystem | binary                     | 

| character_set_results    | latin1                     | 

| character_set_server     | latin1                     | 

| character_set_system     | utf8                       | 

| character_sets_dir       | /usr/share/mysql/charsets/ | 

+--------------------------+----------------------------+

8 rows in set (0.00 sec)

 

Then create these tables with data:

 

CREATE TABLE `test_utf8` (

  `kwid` int(10) unsigned NOT NULL default '0',

  `keyword` varchar(80) NOT NULL default ''

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

 

INSERT INTO `test_utf8` VALUES

(1,'watching'),(2,'poet'),(3,'просмотра'),(4,'Поэту');

 

CREATE TABLE `test_latin1` (

  `kwid` int(10) unsigned NOT NULL default '0',

  `keyword` varchar(80) NOT NULL default ''

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

 

INSERT INTO `test_latin1` VALUES

(1,'watching'),(2,'poet'),(3,'просмотра'),(4,'Поэту');

 

Now compare:

mysqldump test > test_export_utf8.sql

mysqldump --default-character-set=latin1 test > test_export_latin1.sql

 

Note that the test export with the default character set of utf8 has mojibake whereas the export with latin1 does not.

 

So be *extremely* careful when using mysqldump - whether for backups or while upgrading.  You can checksum your data before and after  an export/import with mysqldump to be sure that your data is the same.

 

Advocating For Our Clients - Part 1 Cultural

 

What do companies need from their database professional, MySQL or otherwise? How can we exceed those expectations as a remote team?  This is my first in a series of blog posts discussing exactly how we do so at PalominoDB - regardless if the technology is MySQL, MongoDB, Cassandra, Oracle or ottherwise.

Cultural:  

The majority of our clients are start-ups.  Some are small teams experiencing their first three year growth pains, while others are in the three to seven year period, have proven the effectiveness of their business model, yet retain a strong sense of start-up culture and personality.  When looking for staff, their focus is rightly on people who have passion, drive and personalities that fit with their unique corporate cultures.  How can a remote resource, much less one that is not dedicated to one company full-time, understand not just clients’ product and technology, but the people, the schedules and the drive that support clients' success?

Quite often clients want PalominoDB to have a single point of contact who gives us individual tasks and who functions as a filter between their organization and ours.  While we will work with whatever model is requested, this method builds a certain level of isolation that can limit our effectiveness in the bigger picture.  Being in operations requires a certain push and pull with engineering and product groups to meet business demands and ensure availability and performance. We also require knowledge of a company's business goals and project portfolio.  Otherwise, how can we react with urgency at the appropriate times?  How can we know which issues require escalation and which require push back? 

Once we understand company strategy and priorities, we can start to tailor the decisions we make to our clients’ needs. For example, if we know a particular system is crucial to the success of a client's key project, we are much more inclined to work until 2 am to complete the project or to meet a release deadline. If we know that two months of late nights and weekend work are crucial to helping a client with a customer launch, beat the competition and grow successful, my staff and I willingly devise a plan to support that customer. However, if we perceive that a customer’s demands for last minute changes or large amounts of off- hours work come from poor planning, poor communication or a poorly prioritized product plan, we are much more inclined to put our efforts into improving the underlying processes around change management and project planning. 

As operational professionals, we understand the importance of urgency and the product delivery speed that the modern start-up must work with.  Because of the breadth of our experience, we also know that if production teams had their way, all tasks would be P1s, all reports would be real time and there would never be any downtime (and all work would happen for free) and we act in accordance with this desire to the best of our ability.  When we know that the task we are working on is crucial to our clients’ ability to maintain their competitive edge, we are motivated to work the 12 hour days needed to get it done.  Alternatively, when we know a date is flexible, we can choose not to tax our operations team and evoke the risks associated with working too many hours and making crucial decisions under fatigue. As CEO and principal at PalominoDB, it is my job to work within my clients availability and take care of my staff  Work-life balance is not simply a concept to which I pay lip-service; I believe that a happy, rested and alert operations staff is essential to customer up-time and to keeping human mistakes to a minimum. 

Another question we often get asked is how do we as remote team members correctly align with business so that we can support them at their pace and intensity?  We’ve had the most success with regular knowledge- shares and participation in operational team meetings.  While taking part in our clients’ company-wide sessions has been unnecessary, we have found that attending operations and architecture team meetings where information can be shared down and around is an excellent start.  Getting onto operational team distribution lists is another method we use to learn about what is going on.  Obviously, every DBA on our team cannot do these things for every one of PalominoDB's clients, but the primary DBA assigned to a particular client can, and, as they filter out relevant details, they can share pertinent information with the rest of the team. 

Having that primary DBA serve as a client’s advocate is crucial, and a point I will continually discuss in my writings.  It is the primary DBA who asks questions when information is not forthcoming, who reviews the org charts and introduce themselves to engineers, project managers and QA/release folks.  The primary DBA gets contact info from all of these folk, documents it in CRM, and plugs it into GTalk, Skype or whatever medium is appropriate.  The primary DBA will hang out in a clients’ IRC and campfire rooms and soak up everything they see.  The primary DBA even reads powerpoints (yes really)! Finally, and most importantly, the primary DBA makes on-site visits.  PalominoDB’s operations team makes it a point to try and come on-site at least once every two months.  Some of that time is spent in meetings and some of that time is spent simply dining or hanging out.  Regardless, these on-site visits allow us all to connect, to put names to faces, and to get to know each other. It helps to ensure that our clients understand that PalominoDB is not a faceless company full of replaceable DBAs.  We are a company made up of individuals with skills, quirks, personalities (usually BIG ONES), and we know our clients are the same. 

Does this take time? Yes. However, I ask our clients to think about the savings in cost they accrue by using us instead of maintaining a full-time staff.  The extra time spent on meetings, emails and IRC conversations does not add much in overall cost, yet it is invaluable when building relationships.  Constant contact replaces the water-cooler meetings and impromptu conversations at lunch. That small investment of time in camaraderie and in team-building pays-off in more ways than you can imagine.

MySQL Content at Oracle OpenWorld - Session Matrix

While the online content catalog and schedule builder are great tools to help plan out what sessions I want to see at Oracle OpenWorld, what I really want is a matrix of only the MySQL content, preferably in a matrix that easily shows all the sessions per time period.

So I decided to make the matrix myself - view the HTML online at http://technocation.org/files/doc/2011_OOW_MySQL_Content.html

Or download a PDF (one page per day) at http://technocation.org/files/doc/2011_OOW_MySQL_Content.pdf

If you have feedback, please let me know in the comments or via the e-mail address on the matrix.  These documents are for personal use unless other arrangements have been made.

To see full descriptions, click on a speaker's name to be sent to the content catalog's page for that speaker, then click on the session to get the full description.

Disclosure: Truth About MySQL 2012 Conference Planning

I love Percona Live.  I think it is a great meeting of the minds.  However, I do not think it is a good replacement for the big April MySQL conference.  In fact, neither does Baron Schwartz:

"The conference is organized and owned by MySQL, not the users. It isn’t a community event. It isn’t about you and me first and foremost. It’s about a company trying to successfully build a business, and other companies paying to be sponsors and show their products in the expo hall."
Baron Schwartz, April 23, 2008.
http://www.xaprb.com/blog/2008/04/23/like-it-or-not-it-is-the-mysql-conference-and-expo/

Switch "MySQL" for "Percona", and that is exactly what Baron said in today's announcement:

"Emphasis on business. We need a place where vendors, both open-source and closed-source, can showcase their products and services. This is the hand that feeds all of us. It’s good for Percona’s business, and it’s good for everyone else’s too."

...except in 2008, Baron was calling for a community conference because what was good for business was not good for the community.  

Let me pull out that last sentence from the quote.  Read it to yourself, and tell me if you feel warm fuzzies replacing "Percona" with "Oracle" or "MySQL":

"It’s good for _______’s business, and it’s good for everyone else’s too."

Here is a question for you:  Do you think Oracle will send lots of engineers to talk about current and future plans for MySQL to a conference that is open about saying "It is good for our business?"  

It does not matter if it is Percona, Blue Gecko, or PalominoDB, if Oracle has any business sense (and they make quite a bit of money, so signs are that they have business sense) they will not send engineers to a competitor's company-branded conference.

What does Percona's founder, Peter Zaitsev, have to say about the conference?  He's not really happy about it either:

"I would like to see the conference which is focused on the product users interests rather than business interests of any particular company (or personal interests of small group of people), I would like it to be affordable so more people can attend and I’d like to see it open so everyone is invited to contribute and process is as open as possible. "
Peter Zaitzev, April 23, 2008
http://www.mysqlperformanceblog.com/2008/04/23/conference-for-mysql-users/

Peter, I would like to see that too.  In fact, a small group of folks, including Giuseppe and myself, tried to make that happen, and we had SkySQL, MariaDB, Oracle and IOUG all supporting us. 

Giuseppe called for disclosure about the conference, so I will disclose this:  Baron was not truthful when he said "To the best of our knowledge, no one else was planning one".

Giuseppe asked for full disclosure, so here is a copy and paste of a Skype conversation I had with Percona's Tom Basil on June 29th, 2011:

 

 

Sheeri K. Cabral 6/29/11 12:10 PM 

I think we should develop some ideas in case O'Reilly doesn't end up having a MySQL conference....last year the announcement was late, and it was in May, and I'm starting to think they might not be doing a conference this year, since we haven't heard anything yet.

6/29/11 12:10 PM

if that's the case, I'd like to have a conference anyway, and I'd like to explore options with you, because we need a community-run conference (not Collaborate, but maybe a co-located summit).  And obviously you've had success with Percona Live, but a multi-day conference is really different.

6/29/11 12:11 PM

(FWIW I told Colin Charles in May that I was willing to help co-chair the conference, so I'm still willing to give my support in that area).

Tom Basil 6/29/11 12:14 PM 

Sheeri, can't talk now

Sheeri K. Cabral 6/29/11 12:14 PM 

*nod*  can we schedule a conf call maybe?

Tom Basil 6/29/11 12:14 PM 

Headed out in just few min

6/29/11 12:14 PM

yes, next week

6/29/11 12:14 PM

We tried to schedule conference calls but for almost 6 weeks we were pushed back.  
I had indeed offered to chair or co-chair a conference with Colin Charles, but I simply cannot lend a lot of logistical support to a Percona-branded conference.  I volunteer a lot, but it is for the benefit of the MySQL Community, not for the benefit of a company I do not work for.  
I hope that my openness and candor does not blacklist me from Percona events; I am a popular and sought-after speaker at MySQL events and it will be a loss to the community if that happens.
Percona should keep its Percona Live series.  However, the world's most popular open source database deserves and needs exactly what Peter, Percona's founder, said just a few years ago: "the conference which is focused on the product users interests rather than business interests of any particular company".
I believe Percona Live Santa Clara will be a successful event, and I will try to be a part of it.  I hope it will be as successful for the community as it will be for Percona's business.

 

Final Videos of Open DB Camp Online:

The final videos from Open DB Camp back in May in Sardinia, Italy are now online.  The full matrix of sessions, videos and slides can be found on the schedule page.

Hands on JDBC by Sandro Pinna - video

"MySQL Plugins, What are They? How you can use them to do wonders" by Sergei Golubchek of MariaDB - video

The State of Open Source Databases by Kaj Arnö of SkySQL - video

Coming soon, videos from OSCon Data!

Upcoming Free IOUG Webinar: Securing MySQL

Next week I will give a free IOUG webinar on Securing MySQL on Wednesday, August 10, 2011 from 11:00 AM - 12:00 PM CDT (17:00 GMT):

Securing MySQL is extremely important, but often is not done properly. I will explain the different ways to secure MySQL. In addition to securing users and privileges, file permissions and encrypted connectivity will be discussed. The MySQL server options that contribute to MySQL security will be pointed out, along with tips for eliminating unsecure external scripts. For those who want more auditing capabilities, this session will explain how to see all login attempts (successful and not) and how to lock out accounts with repeated failed logins. The session will conclude with guidelines about how to create security policies for your organization.

To register for this webinar, visit https://www1.gotomeeting.com/register/979260992.  

mydumper & myloader : fast backup and restore

At PalominoDB we do not normally use mysqldump for regular backups, but only in some circumstances (for example MySQL upgrade).
Lately we gave a try to mydumper as an alternative to mysqldump, and results are quite promising.
We found that mydumper performs very fast exporting both small and large datasets!!
We also found that the with large datasets restore with myloader doesn't perform a lot better than simple restore from mysqldump SQL dump: this depends from the storage engine and not from the client used to restore.

On a box we run 2 tests:
1) with a dataset that was fitting in the InnoDB buffer pool;
2) with a dataset larger than the InnoDB buffer pool.

TEST #1

We created 128 tables of 1M rows each, for a total dataset of 31GB on disk:
shell$ time ./sysbench --test=tests/db/parallel_prepare.lua --oltp-tables-count=128 --oltp-table-size=1000000 --mysql-table-engine=innodb --mysql-user=root --num-threads=12 run real 22m0.013s
user 204m22.054s
sys 0m37.430s

Doing the backup with mydumper:
shell$ time ./mydumper -t 8 -B sbtest
real 0m29.807s
user 2m35.111s
sys 0m26.102s

... and with mysqldump:
shell$ time mysqldump --single-transaction sbtest > sbtest.sql
real 6m24.607s
user 5m19.355s
sys 0m46.761s

Within this test, mydumper looks around 13 times faster than mysqldump.

We also tried compression, but I/O was fast enough to make compression only an unnecessary overhead: in other words, on that hardware and with this dataset, mydumper with compression was slower than mydumper without compression.

To complete the test, we tried recovery time, after deleting and recreating and empty database:
shell$ mysql -e "drop database sbtest ; create database sbtest"
shell$ time ./myloader -t 8 -d export-20110720-090954
real 9m12.548s
user 0m55.193s
sys 0m28.316s

shell$ mysql -e "drop database sbtest ; create database sbtest"
shell$ time ( echo "SET SQL_LOG_BIN=0;" ; cat sbtest.sql ) | mysql sbtest
real 46m46.140s
user 9m3.604s
sys 0m48.256s

With this dataset, restore time using myloader was 5 times faster than using the SQL dump from mysqldump.

TEST #2

Test #2 is very similar to test #1 , but with some different in the dataset:
48 tables instead of 128 tables;
10M rows on each table instead of 1M rows;
a total dataset on disk of 114GB instead of 31GB.

First, we created the tables with sysbench:
shell$ time ./sysbench --test=tests/db/parallel_prepare.lua --oltp-tables-count=48 --oltp-table-size=10000000 --mysql-table-engine=innodb --mysql-user=root --num-threads=12 run
real 107m24.657s
user 689m2.852s
sys 2m11.980s

Backup with mydumper:
shell$ time ./mydumper -t 8 -B sbtest
real 7m42.703s
user 15m14.873s
sys 2m20.203s

The size of the backup is quite big because not compressed: 91GB
On average, mydumper was writing on disk at around 200MB/s.

Backup with mysqldump:
shell$ time mysqldump --single-transaction sbtest > sbtest.sql
real 32m53.972s
user 20m29.853s
sys 2m47.674s

mydumper was again faster than mysqldump , but not as much as in the previous test: only 4 times faster.

Was now the time to measure recovery time:
shell$ mysql -e "drop database sbtest ; create database sbtest"
shell$ time ./myloader -t 6 -d export-20110720-171706
real 130m58.403s
user 4m5.209s
sys 1m51.801s

shell$ mysql -e "drop database sbtest ; create database sbtest"
shell$ time ( echo "SET SQL_LOG_BIN=0;" ; cat sbtest.sql ) | mysql sbtest
real 204m18.121s
user 34m33.520s
sys 3m43.826s

myloader is just a bit more than 50% times faster than importing the SQL dump from mysqdump

Conclusion from second test:
a) With larger dataset, mydumper slows down because the system does more I/O as the dataset doesn't fit in memory, but still way faster than mysqldump.
b) With larger dataset, load time with myloader slowed down a lot. Although, the root cause of the performance drop isn't mydumper , but:
- more I/O (dataset + dump don't fit in RAM);
- InnoDB inserts rate degenerates with bigger tables.

On the topic of InnoDB inserts rate degeneration with big tables , probably another blog post will follow.

Notes on hardware and configuration:
CPU: 2 x 6cores with HT enabled
96 GB of RAM
FusionIO

innodb_buffer_pool_size=64G
innodb_log_file_size=2047M
innodb_io_capacity=4000
innodb_flush_log_at_trx_commit=2
(binlog disabled)

Website by Digital Loom