Important Caveats When Using pt-online-schema-change


Using pt-online-schema-change is a great way to change a table without downtime or switching over to a secondary database first.  While using it on production systems, we found some interesting issues to consider.


1) There were foreign_keys referencing the table we were altering.  There are two options to handle this.  They are specified with "--update-foreign-keys-method rebuild_constraints|drop_old_table".  One is to rebuild the tables with foreign keys.  This does lock the tables, and will block reads and writes while it is happening.  I chose to use drop_old_table.  This method disables foreign key checks, then drops the original table and renames the new table before reenabling foreign key checks.  Normally, pt-online-schema-change just renames the tables, then drops the old table, so locking is minimal.  We did find that there was locking while dropping the original table and renaming the new table.  This is something to be cautious with, especially with large tables.  In theory, you could run an online change to drop the foreign key constraints on the child tables prior to working on the parent table.


2) When testing, this ran in 15 minutes against an unladen database and took over 24 minutes against production.  You could add --sleep n, where n is a number of seconds, if you need it to add less load to production.  


3) One limit we also saw is that pt-online-schema-change will only work on tables with at least one unique single column index, such as a single column primary key.  Often partitioned tables don't have a unique single column index.



Here is some of the output of a sample pt-online-schema-change...

## Install pt-online-schema-change and pre-requisites...
chmod 755 pt-online-schema-change
yum install perl-DBI
yum install perl-DBD-MySQL
time ./pt-online-schema-change h=,D=my_schema,t=my_table -P3306 --alter "add column signature varbinary(255) NULL DEFAULT NULL; add column signature_version varchar(4) NULL DEFAULT NULL" --child-tables fk_table1,fk_table2 --update-foreign-keys-method drop_old_table -pXXXX --bin-log
# 2011-10-14T03:20:28 ./pt-online-schema-change started
# 2011-10-14T03:20:28 USE `my_schema`
# 2011-10-14T03:20:28 Alter table my_table using temporary table __tmp_my_table
# 2011-10-14T03:20:28 Checking if table my_table can be altered
# 2011-10-14T03:20:28 SHOW TRIGGERS FROM `my_schema` LIKE 'my_table'
# 2011-10-14T03:20:28 Table my_table can be altered
# 2011-10-14T03:20:28 Chunk column id, index PRIMARY
# 2011-10-14T03:20:28 Chunked table my_table into 25685 chunks
# 2011-10-14T03:20:28 User-specified child tables: fk_table1, fk_table2
# 2011-10-14T03:20:28 Starting online schema change
# 2011-10-14T03:20:28 CREATE TABLE `my_schema`.`__tmp_my_table` LIKE `my_schema`.`my_table`
# 2011-10-14T03:20:28 ALTER TABLE `my_schema`.`__tmp_my_table` add column signature varbinary(255) NULL DEFAULT NULL
# 2011-10-14T03:20:29 ALTER TABLE `my_schema`.`__tmp_my_table`  add column signature_version varchar(4) NULL DEFAULT NULL
# 2011-10-14T03:20:30 Shared columns: id, col1, col2, col3, col4, col5
# 2011-10-14T03:20:30 Calling OSCCaptureSync::capture()
# 2011-10-14T03:20:30 CREATE TRIGGER mk_osc_del AFTER DELETE ON `my_schema`.`my_table` FOR EACH ROW DELETE IGNORE FROM `my_schema`.`__tmp_my_table` WHERE `my_schema`.`__tmp_my_table`.id =
# 2011-10-14T03:20:30 CREATE TRIGGER mk_osc_upd AFTER UPDATE ON `my_schema`.`my_table` FOR EACH ROW REPLACE INTO `my_schema`.`__tmp_my_table` (id, col1, col2, col3, col4, col5) VALUES (, NEW.col1, NEW.col2, NEW.col3, NEW.col4, NEW.col5)
# 2011-10-14T03:20:30 CREATE TRIGGER mk_osc_ins AFTER INSERT ON `my_schema`.`my_table` FOR EACH ROW REPLACE INTO `my_schema`.`__tmp_my_table` (id, col1, col2, col3, col4, col5) VALUES(, NEW.col1, NEW.col2, NEW.col3, NEW.col4, NEW.col5)
# 2011-10-14T03:20:30 Calling CopyRowsInsertSelect::copy()
Copying rows:   1% 24:52 remain
Copying rows:   3% 25:01 remain
Copying rows:  96% 00:43 remain
Copying rows:  98% 00:15 remain
# 2011-10-14T03:44:14 Calling OSCCaptureSync::sync()
# 2011-10-14T03:44:14 Renaming foreign key constraints in child table
# 2011-10-14T03:44:14 SET foreign_key_checks=0
# 2011-10-14T03:44:14 DROP TABLE `my_schema`.`my_table`
# 2011-10-14T03:44:49 RENAME TABLE `my_schema`.`__tmp_my_table` TO `my_schema`.`my_table`
# 2011-10-14T03:44:49 Calling CopyRowsInsertSelect::cleanup()
# 2011-10-14T03:44:49 Calling OSCCaptureSync::cleanup()
# 2011-10-14T03:44:49 DROP TRIGGER IF EXISTS `my_schema`.`mk_osc_del`
# 2011-10-14T03:44:49 DROP TRIGGER IF EXISTS `my_schema`.`mk_osc_ins`
# 2011-10-14T03:44:49 DROP TRIGGER IF EXISTS `my_schema`.`mk_osc_upd`
# 2011-10-14T03:44:49 ./pt-online-schema-change ended, exit status 0
real    24m20.777s
user    0m3.936s
sys     0m1.216s

Indexing text columns with GIST or GIN to optimize LIKE, ILIKE using pg_trgm in Postgres 9.1 (Part 1)

“The pg_trgm module provides functions and operators for determining the similarity of ASCII alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.”
This is the introduction to the official documentation of the extension at [1]. Note, that I used the EXTENSION terminology instead of CONTRIB, as in PostgreSQL 9.1.  Now we’ll use the CREATE EXTENSION command to include this module in the database. This new methodology allows us to manage modules installations/uninstallations with only a few commands.

The idea of this post is to show you how KNN GIST and pg_trgm could be used together to obtain interesting results. First, let’s start with some basic elements of pg_trgm.


Installing the module is easy. If you are installing through the use of source code, you must compile the module and once you get access to the database execute:


That’s it! Installation complete!

What is a Trigram and how use them?

A trigram is a group of three consecutive characters in a string that can be used to detect the similarity of two words (for example) or the ‘distance’ between them.

When we talk about ‘distance’, 0 means in the same place and 1 is very far. When we talk about similarity, 1 is equal and 0 is totally different. In other words, distance is 1 minus the similarity value. These concepts about distance and similarity, are necessary to start without confusion.

What’s is a trigram? A trigram is a group of three consecutive characters from a string, used to know the similarity between two strings by counting the trigrams they share.

A trigram looks like this:

palominodb=# select show_trgm('PalominoDB CO');
{"  c","  p"," co"," pa",alo,"co ","db ",ino,lom,min,nod,odb,omi,pal}
(1 row)

In the pg_trgm extension we have functions and operators. show_limit and set_limit are functions used to set up and show the similarity threshold for the % operator. This operator takes the form “string1 % string2” and returns a boolean type (“t” if the similarity is greater than the similarity threshold, otherwise it returns “f”).

palominodb=# select show_limit();

palominodb=# select set_limit(0.3), show_limit();
set_limit | show_limit
         0.3 |    0.3

In the following example we’ll see the use of each one. Operator % will return true if the similarity of the strings is greater than similarity threshold returned by show_limit function. In this example, both string are equal, in consequence, the operation will return true:

palominodb=# select similarity('Palomino','Palomino')  AS Similarity,
                                  'Palomino'<->'Palomino'              AS distance,
                                  'Palomino' % 'Palomino'              AS SimilarOrNot;
-[ RECORD 1 ]+--
similarity          | 1
distance     | 0
similarornot     | t

Index Support and usage

Now let’s discuss combining GIST or GIN and pg_trgm. pg_trgm includes an operator class to support searches using similarity, like, or ilike operators. GIN and GIST have several differences. If you don’t know which to choose, just remember a few rules: GIN searches quicker than GIST but is slower to update; if you have a write-intensive table use GIST. GIN is better for static data. Please be aware, however, that they don’t support exact matching with the equals operator!  You can do an exact match using like/ilike with no wildcards.  If you want to use the equals operator(=), you must create a standard BTREE index on the pertinent column.

In the following examples, we’ll show a table only with a GIST index. As you can see, if you want to match the exact value with equal operator, it will scan the whole table:

palominodb=# EXPLAIN ANALYZE  SELECT id, texto FROM texto_busqueda WHERE texto = 'Palomino';
                                                QUERY PLAN
Seq Scan on texto_busqueda  (cost=0.00..90.15 rows=1 width=136) (actual time=16.835..16.846 rows=1 loops=1)
  Filter: (texto = 'Palomino'::text)
Total runtime: 17.094 ms
(3 rows)

But, if we use LIKE operator, index scan will be activated:

palominodb=# EXPLAIN ANALYZE  SELECT id, texto FROM texto_busqueda WHERE texto like 'Palomino';
                                                               QUERY PLAN
Index Scan using texto_busqueda_texto_idx on texto_busqueda  (cost=0.00..8.27 rows=1 width=136) (actual time=0.374..1.780 rows=1 loops=1)
  Index Cond: (texto ~~ 'Palomino'::text)
Total runtime: 1.979 ms
(3 rows)

palominodb=# EXPLAIN ANALYZE  SELECT id, texto FROM texto_busqueda WHERE texto like '%Palomino%';
                                                               QUERY PLAN
Index Scan using texto_busqueda_texto_idx on texto_busqueda  (cost=0.00..8.27 rows=1 width=136) (actual time=0.171..1.732 rows=1 loops=1)
  Index Cond: (texto ~~ '%Palomino%'::text)
Total runtime: 1.882 ms
(3 rows)

To use an index for  match equal strings, we need to create a BTREE index. But in case of BTREE there is a limitation of 8191 bytes per index row. So, if you have very large text columns you will not allowed to create a BTREE index without using functional indexes.

We get this result because, unlike BTREE indexes, the search string is not left-anchored.

The creation of indexes with the pg_trgm operator class is simple:

CREATE INDEX ON texto_busqueda USING GIST(texto gist_trgm_ops);
CREATE INDEX ON texto_busqueda USING GIN(texto gin_trgm_ops);

If you want a more comprehensive understanding of GIST or GIN implementation on Postgres, you can download the the source code and read  src/backend/access/gist/README and src/backend/access/gin/README.

Another useful technique

Combining % operator to get the strings that have a similarity greater than the established threshold and similarity function -that returns the similarity-, we can get ordered from the most similar to the less one discarding all the strings that aren’t similar enough:

palominodb=# SELECT ctid, similarity(texto, 'Palominodb') AS simil
palominodb-#  FROM texto_busqueda
palominodb-#  WHERE texto % 'Palominodb'
palominodb-#  ORDER BY simil DESC;
 ctid  |  simil
(55,3) | 0.666667
(1 row)

The same query, but with its EXPLAIN plan shows that the generated index is used in the condition:

palominodb=# EXPLAIN ANALYZE  SELECT ctid, similarity(texto, 'Palominodb') AS sml
palominodb-#  FROM texto_busqueda
palominodb-#  WHERE texto % 'Palominodb'
palominodb-#  ORDER BY sml DESC;
                                                             QUERY PLAN
Sort  (cost=14.26..14.27 rows=3 width=138) (actual time=3.428..3.437 rows=1 loops=1)
  Sort Key: (similarity(texto, 'Palominodb'::text))
  Sort Method: quicksort  Memory: 17kB
  ->  Bitmap Heap Scan on texto_busqueda  (cost=4.28..14.24 rows=3 width=138) (actual time=3.336..3.383 rows=1 loops=1)
        Recheck Cond: (texto % 'Palominodb'::text)
        ->  Bitmap Index Scan on texto_busqueda_texto_idx  (cost=0.00..4.28 rows=3 width=0) (actual time=3.278..3.278 rows=1 loops=1)
              Index Cond: (texto % 'Palominodb'::text)
Total runtime: 3.578 ms
(8 rows)

Well, this is the first part. Hope you enjoyed the reading and I’ll wait your comments and feedback!



Why are your indexes larger than your actual data?

I sometimes encounter this phenomenon when a database has tables in which the indexes are larger—sometimes much larger—than the actual data stored in the table. Often, the table becomes unwieldy, which generates a number of questions: Should I partition? Should I shard? Should I use a different database entirely?


The problem is especially common with InnoDB tables because of:

* Clustered indexes - in Innodb, the primary key is included in every secondary key. If the primary key is large, it could have a detrimental effect on the entire table.

* Slight index overhead - "each index record contains a five-byte header that may be preceded by a variable-length header. The header is used to link together consecutive records, and also in row-level locking." 

* The indexes are fragmented (here’s what you can do about this problem).

When comparing to MyISAM, which compresses its string indexes, InnoDB looks quite big.


Here is an actual example:

 mysql> SELECT engine, count(*) tables, concat(round(sum(table_rows)/1000000,2),'M') rows, concat(round(sum(data_length)/(1024*1024*1024),2),'G') data, concat(round(sum(index_length)/(1024*1024*1024),2),'G') idx, concat(round(sum(data_length+index_length)/(1024*1024*1024),2),'G') total_size, round(sum(index_length)/sum(data_length),2) idxfrac

FROM information_schema.TABLES GROUP BY engine

ORDER BY sum(data_length+index_length) DESC LIMIT 10;


| engine     | tables | rows    | data   | idx     | total_size | idxfrac |


| InnoDB     |    960 | 278.31M | 75.76G | 119.27G | 195.03G    |    1.57 |

| MyISAM     |   1054 | 140.75M | 16.43G | 8.21G   | 24.64G     |    0.50 |

| MRG_MYISAM |     18 | 4.46M   | 0.48G  | 0.00G   | 0.48G      |    0.00 |

| MEMORY     |     44 | NULL    | 0.00G  | 0.00G   | 0.00G      |    NULL |

| NULL       |      7 | NULL    | NULL   | NULL    | NULL       |    NULL |



mysql> SELECT CONCAT(table_schema, '.', table_name),  CONCAT(ROUND(table_rows / 1000000, 2), 'M')  rows,  CONCAT(ROUND(data_length / ( 1024 * 1024 * 1024 ), 2), 'G')                    DATA, CONCAT(ROUND(index_length / ( 1024 * 1024 * 1024 ), 2), 'G')  idx, CONCAT(ROUND(( data_length + index_length ) / ( 1024 * 1024 * 1024 ), 2), 'G') total_size, ROUND(index_length / data_length, 2) idxfrac FROM  information_schema.TABLES ORDER  BY data_length + index_length DESC LIMIT  3;


| CONCAT(table_schema, '.', table_name) | rows   | DATA  | idx    | total_size | idxfrac |


| db1.big_old_table              | 48.48M | 8.38G | 27.25G | 35.63G     |    3.25 |

| db1.big_new_table            | 38.63M | 6.32G | 18.13G | 24.44G     |    2.87 |

| db1.another_big_table       | 17.83M | 2.14G | 9.92G  | 12.06G     |    4.64 |



Warning: be careful when you run the above queries. Try to run them on an available and non-production server whenever possible as they may seriously slow down your server.


As you can see, the differences between the index sizes vary between 3-4.5 times the size of the data.


My experience suggests the root causes of why the indexes are bigger than the data, could be any of the following:

1) Too many indexes

2) Indexes on big columns - like varchar(255)

3) Redundant or duplicate indexes

4) Combination of all of the above


And what are the root causes of these problems? Here are some potential answers, garnered, once again, from personal experience:

1) A happy trigger on the phpmyadmin "add index" button

2) When the table was small, some queries were slow and indexes were added indiscriminately until these queries were fast (which explains the previous point)

3) Someone tried numerous different combinations of multi-column indexes to speed up different queries using the same table(s)

4) Additional indexes could have been used to solve bad (after-the-fact) schema design issues


Please note that there could be more causes.  This is only a small sample of possibilities.


How to solve the problem?

This is something I am still learning, but so far I have discovered that:

1) You need to find out which indexes are not being used any more. To do that, you can use pt-index-usage. What you would need is the general log (or your slow log at long_query_time=0) and you would need to run it through pt-index-usage. This will generate a list of ALTER TABLE statements. This tool can also generate the findings or the decisions for the results into database of your choice (meaning you need to give it a database server if you want that extra data).

This report logs the indexes that were used by the queries in the general/slow log and suggests that you DROP the ones that are not. What's important to remember is that queries which did not run during the time you took your log - such as weekly or monthly reports - will not show up in the results.

I find this tool useful, because I can use it with the same log I used for pt-digest-query. However, I do not blindly run all the drop table statements, I just use them to help me understand the indexes better.

I sometimes use pt-duplicate-key-checker, but usually, pt-index-usage finds all or almost all of what pt-duplicate-key-checker finds.


2) I pick a table from the top table query I ran and investigate the indexes on it. I look for duplicate indexes or an index with similar columns (such as index1(a,b), index2(a), index3(b,a,c)) and try to determine why they are there.

Next, I try to go over the queries that access the table from the general/slow log and see how they use the indexes. There are no real short cuts to this process. Usually, I focus on low hanging fruit and I am very careful not to negatively impact other queries that I may not know about.

What I try to do is gather the uniqueness/cardinality of the indexes and determine how they help queries.


Here is an example to explain better:

show create table users(


fullname VARCHAR(255) DEFAULT '',

active ENUM('Y','N') DEFAULT 'Y',


KEY index1(fullname)



Let’s pretend that the users table has 100k rows.  Your query searches on the `fullname` column, which also has an index. 

In the case of most queries, the names may be unique enough to give back a few rows. But names like "John Smith" may give back an unwieldy number of results. If you are only interested in active users, then you can make a change like this:


ALTER TABLE users DROP INDEX index1, ADD INDEX index1(fullname,active);


By doing so, you increase the cardinality of the index because the additional column `active` may reduce the number of results yielded. In the case of the example, there may be over 100 John Smiths, but only 30 that are active.


3) Try to reduce the size of the queries. In the case of the earlier example, `fullname` is a varchar(255). If we use UTF8 for the table, then the index for it will use 767byes. We must determine what the fewest amount of characters that we can get away with is. This could be difficult if we don’t have any data to use for comparison.

If I were running this query, I would use the following process to  determine what that number might be:


mysql> set @unique_total=0;

mysql> select count(distinct fullname) into @unique_total from users;

0 row in set (6 min 28.29 sec)


mysql> select @unique_total;


| @unique_total |


|        100448 |


1 row in set (0.00 sec)


mysql> select round(count(distinct left(fullname,200))/@unique_total*100,2) as c200, round(count(distinct left(fullname,100))/@unique_total*100,2) as c100,

    -> round(count(distinct left(fullname,75))/@unique_total*100,2) as c75, round(count(distinct left(fullname,50))/@unique_total*100,2) as c50,

    -> round(count(distinct left(fullname,30))/@unique_total*100,2) as c30, round(count(distinct left(fullname,20))/@unique_total*100,2) as c20 from users;


| c200   | c100   | c75    | c50   | c30   | c20   |


| 100.00 | 100.00 | 100.00 | 99.97 | 99.81 | 92.86 |


1 row in set (4 min 13.49 sec)


You can see that if I limit the index to 30 characters, it will be very precise. If I try a lower number, it will become less and less precise. Technically, I would aim for an index that’s limited to over 95% of the existing data.


If you set the number for the index’s characters too low, this may have a negative effect.

To explain what I mean by that, let’s take the full name of the actor Daniel Michael Blake Day-Lewis, and let’s say that our index is limited to 20 characters.  If we search “Daniel Michael Blake Day-Lewis,” we may get back results from the database for:

Daniel Michael Blake Day-Lewis

Daniel Michael Blake

Daniel Michael Blake Smith

Daniel Michael Blake Willis


The database will have to further search these rows after it has been brought up from the storage engine. So using a limited index in this case will require the database to do additional work.


4) If at all possible, try to change a VARCHAR column/index that has a defined number of rows to an ENUM (1 or 2 bytes) or normalize it to another table and keep the id (usually an INT) in the original table. Doing so can save a ton of space (remember the VARCHAR and UTF8 comment I made above?). Check the cardinality of your VARCHAR column using SHOW INDEX and see if that column’s cardinality is between 0-255 or 0-65535 to determine if it’s a good candidate for this conversion.


There are many more techniques for and things you should know about going over indexes. I hope that this post will encourage you to learn more of them. If you are interested in doing so, I will be speaking at Percona Live London, were I will cover this subject (and others) in more depth. Hope to see you there!

TechRepublic Article: "Are tech certs valuable or not?"

I normally do not repost articles and make teeny tiny blog posts (I have noticed a few bloggers who do that, and it bothers me), but I found a short article on TechRepublic called Are tech certs valuable or not? to be quite interesting.

The author did an informal poll of a few CBS interactive higher-ups, and it was interesting to note that in addition to the expected "it depends" and "experience is the most important thing" and "beware the person with many certs and no experience" answers, the MySQL certification was specifically noted as being good:

Doug Lane, Director of Site Reliability and Performance Engineering, says, “The MySQL DBA certifications are valuable to me, since we support so many instances of that. Frankly, the Java and MS certs became less valuable due to the number of those who were just paper certified but not actually practitioners.”

Interestingly, he says the reason the MySQL certifications are valuable is because CBS interactive supports so many instances of MySQL, but he implies that folks who have MySQL DBA certifications also have experience (as opposed to Java/MS certs).

Change Views DEFINER without ALTER VIEW : how to fix thousands of views!

Recently I faced an interesting challenge: modify over 26k views on a single mysqld instance.

Altering the view one by one was far from an easy task, so a I had to look for an efficient way to do it. Read to find out more.



Views have security context and their own access control.

When a view is created, a DEFINER is assigned to it (by default, the user that creates the view), and a SQL SECURITY that specifies the security context (definer by default).

Assume you use 'user123'@'192.168.1.%' , and you issue the follows (a very simple view):



Behind the scene, this becomes:

CREATE ALGORITHM=UNDEFINED DEFINER=`user123`@`192.168.1.%` SQL SECURITY DEFINER VIEW `view1` AS select <omitted> from `tbl1` ;





So far, all good.

What if your application change user to `user456`@`192.168.1.%` ?

The result can be very disappointing (even if expected):

mysql> SELECT * FROM view1;

ERROR 1045 (28000): Access denied for user 'user456'@'192.168.1.%' (using password: YES)


What if you use a highly privileged user like root?

mysql> SELECT * FROM view1;

ERROR 1449 (HY000): The user specified as a definer ('user123'@'192.168.1.%') does not exist

Not much more luck, we are sort of stuck until we change permission.


ALTER VIEW has the same syntax of CREATE VIEW , ( just 's/CREATE/VIEW/' ) , therefore there is no way to change only the definer: all the statement that define the view need to be re-issued.


Even if recreating the view is an easy task, isn't that easy if you thousands and thousands of views to fix.



There is a dirty way to do this, but it does its job!

Each view is a .frm file in the database directory.

Changing the definer is easy as editing the file changing definer_user and/or definer_host .


This is the procedure to update all the views, no matter their number (26k views updates in only few minutes):

shell> cd /var/lib/mysql

shell> for i in `find . -name "*frm"` ; do if [ `cat $i | egrep '^definer_user=user123$' | wc -l` -ne 0 ]; then echo $i ; fi ; done > /tmp/list_views.txt

# don't forget to make a backup of your views!

shell> tar cf /tmp/views_backup.tar /tmp/list_views.txt

shell> for i in `cat /tmp/list_views.txt` ; do tar rf /tmp/views_backup.tar $i ; done

shell> for i in `cat /tmp/list_views.txt` ; do cp -a $i $i.old && sed -e 's/^definer_user=user123$/definer_user=user456/' $i.old > $i && rm $i.old ; done



Describing the above procedure:

- /tmp/list_views.txt lists all the views that need to be modified ;

- /tmp/views_backup.tar is a backup with all the frm listed in above list ;

- for each view (file) : copy the .frm file as .frm.old , applies the change using sed , remove the .frm.old file

- close all opened tables (included view) so the changes take effects.



How Oracle Has Done Nothing to Change MySQL

Last night at the Oracle OpenWorld MySQL Community Reception, there were lots of old and new friends milling about.  It occurred to me that there is one very important thing Oracle has NOT changed about the MySQL world - the rock stars and higher-ups are still readily accessible.

One of the things I love about being in the open source community is that you can have an in-depth conversation with someone, and only later on find out that this person is famous.  For the most part, rock stars and important people are readily accessible.  They stay in the same hotels that attendees do, they take the same elevators, they are not whisked away by bodyguards, and they do not play the "don't you know who I am?" card.

Now, it's not surprising that the community rock stars like Mark Callaghan, Baron Schwartz, Giuseppe Maxia and Sarah Novotny are still as accessible as ever.  However, Ed Screven and Thomas Ulin were also around for the party, and I can confirm that Thomas was one of the last dozen or so to leave (Ronald Bradford and I closed out the party and were the last to leave).

So, kudos to Oracle for not keeping your VIPs locked up in a bunker.  I am very glad to see this aspect of open source culture still thriving.

Liveblogging at OOW: State of the Dolphin

Thomas Ulin, VP of MySQL Enginnering, speaks about "State of the Dolphin" at 2011 Oracle OpenWorld.  There are some pretty cool new features in MySQL 5.6 development milestone release 2, and they are all quite stable, which is exciting.  They want to add more features before going GA with MySQL 5.6, but the ones in there are pretty ready to go.

"The 15-minute rule [MySQL can be installed in 15 minutes] is now down to 3 minutes for the full MySQL stack."  Download one package, and a GUI helps you install and configure everything.

MySQL Enterprise HA: Windows Server Failover Clustering, uses Windows Server Failover Clustering from Microsoft and the cluster is managed through the Windows tools.

Ulin talked a lot about how MySQL is good on Windows, and how it is better and lower TCO than Microsoft SQL Server.  They are focusing on Visual Studio, MS Office Integration, Entity Framework, Windows Administration tooling, and more.

Talks about the parts of MySQL Enterprise Edition:

MySQL Workbench

MySQL Backup - completely new from the ground up, Ulin does not have slides but there is a presentation on Wednesday I am looking forward to going to.  

MySQL Enterprise Security: New a few weeks ago, new authentication modules - PAM (uses LDAP, Kerberos, etc) or Windows authentication.  A good slide with how it works:

The application server sends the authentication information to MySQL, which is sent to the PAM library, which verifies the credentials, and returns "yes/no" to the mysql user.  "CREATE USER joe IDENTIFIED WITH pam_plugin AS 'joe';" - NOTE: specific privileges are still managed on the MySQL level (e.g. not everyone gets the SUPER privilege!)

MySQL Enterprise Scalability - thread pooling.  Pool contains a configurable number of thread groups (default=16) and each group manages up to 4096 re-usable threads.  Each connection asssigned to a  group via round robin.  20x better scalability read/write using thread pooling, 3x better scalability for read-only workloads.  There is an API so the thread pooling can be implemented manually.

OVM template for MySQL - on Oracle Linux.



5.6 DMR 2 (development milestone release)

Close enough to be an RC or GA, they are very confident.  But they are not releasing as GA because they want to add more features.

New features - builds on MySQL 5.5 by improving:

optimizer - better performance, scalability

 - filesort optimizations with small limits

 - avoids creating intermediate sorted files by producing an ordered result set using a single table scan and sort on the fly.  

index condition pushdowns

 - was a 6.0 feature

 - good for composite indexes

 - why wouldn't you have this turned on, btw?  

batched key access and multi-range read

 - Improves performance of disk-bound JOIN queries

 - handles batches of keys instead of traditional nested-loop joins which do one key at a time.  Takes advantage of sequential reads.

postponed materialization

 - of views/subqueries in the FROM clause (aka derived tables)

 - Allows fast EXPLAINs for views/subqueries

 - Avoid materialization when possible, faster bail out

 - A key can be generated for derived tables.  

 - Unsurprisingly, this optimization is huge - 240x better execution times (e.g. drop from 8 minutes to about 2 seconds)


 Persistent Optimizer Statistics

 Optimizer traces


performance schema: better instrumentation

- up to about 500 instrumentation points

 - Statements/stages - where do my most resource-intensive queries spend the most time?

 - TAble/Index I/O, Table Locks - which tables/indexes cause the most load/contention?

 - Users/Host/accounts - using the most resources

 - Network I/O

 - Sumaries - agg by thread, user, host, account or object


innodb - better transactional throughput

 - new I_S tables: Metrics, Systems, Buffer pool info

 - Dump/restore buffer pool

 - Limit insert buffer size (ibuf)


replication for HA, data integrity

 - Better data integrity: crash-safe slaves, replication checksums, crash-safe binlog

 - Better performance: multi-threaded slaves, reduced binlog size for row-based binary logging

- Extra flexibility - time-delayed replication

- Simpler troubleshooting - row-based replication logging of original query

- Enhanged monitoring/management.

"NotOnlySQL" (NoSQL) for more flexibility


 - ipv6 improvements

 - Unicode support for Windows commandline client

 - import/export to/from partitioned tables

 - explicit partition selection

 - GIS/MyISAM: Precise spatial operations


MySQL Cluster 7.2: DMR 2

Close enough to be an RC or GA, they are very confident.  But they are not releasing as GA because they want to add more features.

 - 70x higher complex query performance using Adaptive Query Localization (example was an 11 way join)

 - native memcached API - with no application changes.  reuses standard memcached clients & libraries.  Also eliminates cache invalidation slamming MySQL

 - MSQL 5.5 Server Integration - previous versions were integrated with MySQL 5.1, so in MySQL Cluster 7.2 you have access to the new features - upgrading can be done in an online fashion with the rolling upgrade.

 - Multi-geographical site clustering - DR and data locality, no passive resources.  Can split data nodes and node groups across data centers and have auto-failover.

 - Simplified active/active replication - eliminates requirement for application and schema changes.  There are transaction-level rollbacks.  This helps with conflict detection when more than one master is written to.

 - Consolidated privileges for easier provisioning and administration

MySQL Enterprise Oracle Certifications

 - All the Oracle Fusion MiddleWare ship with the MySQL 5.x JDBC driver.

MySQL is being put into or being supported by a lot of Oracle products, including Fusion MiddleWare, including Oracle E-business suite.  This means you can use MySQL as a data source.

Coming soon: using the Auditing API in MySQL 5.5, Oracle will be adding an audit plugin that can (among other things) produce an audit stream that Oracle Audit Vault can then use.

Postgresql 9.1 - Part 1: General Features

General scope

Postgresql 9.1 runs over the theme “features, innovation and extensibility” and it really does. This version was born to overcome Postgresql 9.0 ‘s limitations and known bugs in replication. If you are developing over 9.0, it’s time to think seriously about preparing your code for Postgresql 9.1.
The intent of this series of posts are not to be another release features posts. I offer a vision based on my personal experience and focus on the features that I saw exciting for the most of the projects where I’m involved. If you want to read an excellent general article about the new features of this version, web to [2].
At the moment of this post, the last PostgreSQL version is 9.1.1 . It includes 11 commits to fix GIST memory leaks, VACUUM improvements, catalog fixes and others. A description of the minor release can be check at [3].
The main features included are:

  • Synchronous Replication
  • Foreign Data Support
  • Per Column collation support
  • True SSI (Serializable Snapshot Isolation)
  • Unlogged tables for ephemeral data
  • Writable Common Table Expressions
  • K-nearest-neighbor added to GIST indexes
  • Se-linux integration with the SECURITY LEVEL command
  • Update the PL/Python server-side language
  • To come: PGXN Client for install extensions easily from the command line. More information:  The source will be on

Some of these features could be considered minor, but many think they are very cool while using 9.1  in their environments.

Considerations before migrating

If you are an old Pg user, you may already know the migration risks listed on the next page. Still, I advise that you note and carefully learn about these risks. Many users freeze their developments to older versions simply because they didn’t know how to solve new issues. The most notable case is when 8.3 stopped using implicit casts for some datatypes and many queries didn’t work as a result.  

There are some important changes that could affect your queries, so take a pen and note:

  • The default value of standard_conforming_strings is now turned on by default. That means that backslashes are normal characters (which is the SQL standard behavior). So, if you have backslashes in your SQL code, you must add E’’ strings. For example: E’Don’t’
  • Function-style and attribute-style data type casts were disallowed for composite types. If you have code like value_composite.text or text(value_composite), you will need to use CAST or :: operator.
  • Whereas before the checks were skipped, domains are now based on arrays when they are updated, which results in a rechecking of the constraints.
  • String_to_array function returns now an empty array for a zero-length string (before it returned NULL). The same function splits into characters if you use the NULL separator.
  • The inclusion of the INSTEAD OF action for triggers will require you to recheck the logic of your triggers.
  • If you are an actual 9.0 replication user, you may know that in 9.1 you can control the side effects of VACUUM operations during big queries execution and replication. This is a really important improvement. Basically, if you run a big query in the slave server and the master starts a VACUUM process, the slave server can request the master postpone the cleanup of death rows that are being used by the query.

Brief description of main features

Don’t worry about the details, we’ll cover each feature in future posts.

  • Synchronous Replication
    • This feature enhances the durability of the data. Only one server can be synchronous with the master, the rest of the replicated servers will be asynchronous. If the actual synchronous server goes down, another server will become synchronous (using a list of servers in synchronous_standby_names).  Failover is not automatic, so you must use external tools to activate the standby sync server, one of the most popular is pgpool [4].
  • Foreign Data Support
    • The feature of Foreign Data Wrappers has been included since 8.4, but now it is possible to reach data from any database where a plugin exists. Included in the contribs, is a file called file_fwd, which connects CSV files to a linked table. Basically it provides an interface to connect to external data. In my opinion, this is perhaps one of the most useful features of this versions, especially if you have different data sources in your environment.
  • Serializable Snapshot Isolation
    • This new level of serialization is the strictest. Postgres now supports READ COMMITED, REPEATABLE READ (old serializable) and SERIALIZABLE. It uses predicate locking to keep the lock if the write would have an impact on the result. You will not need explicit locks to use this level, due to the automatic protection provided.
  • Unlogged tables
    • Postgres uses the WAL log to have a log of all the data changes to prevent data loss and guarantee consistency in the event of a crash, but it consumes resources and sometimes we have data that we can recover from other sources or that is ephemeral. In these cases, creation of unlogged tables allows the database to have tables without logging into the WAL, reducing the writes to disk. Otherwise, this data will not be replicated, due to the mechanism of replication used by Postgres (through WAL records shipping).
  • Writable Common Table Expressions
    • CTE was included in 8.4 version, but in this version, it was improved to allow you to use writes inside the CTE (WITH clause). This could save a lot of code in your functions.
  • K-nearest-neighbor added to GIST indexes
    • Postgres supports multiple types of indexes; one of them is GiST (Generalized Search Tree). With 9.1, we can define a ‘distance’ for datatypes and use it for with a GiST index. Right now, this feature is implemented for point, pg_trgm contrib and others btree_gist datatypes. The operator for distance is <-> . Another feature you will enjoy is that LIKE and ILIKE operators can use the tgrm index without scanning the whole table.
  • SE-Linux integration
    • Postgres is now the first database to be fully integrated with military security-grade. SECURITY LABEL applies a security label to a database object. This facility is intended to allow integration with label-based mandatory access control (MAC) systems such as SE-Linux instead of the more traditional access control - discretionary with users and groups. (DAC).



What Community Resources should be at Oracle OpenWorld?

A short while ago I posted about the Oracle OpenWorld Schedule Matrix of MySQL sessions (in PDF and HTML formats).  We have printed up a (small) number of schedules to have on hand at the MySQL Community kiosk at the User Group Pavillion in Moscone West.

Yes, you read that correctly -- the User Group Pavillion will include a MySQL Community kiosk this year!  Sarah and I have been coordinating the effort to staff the kiosk and figure out what we need to provide.

Sadly, it's just a kiosk (same as all the other User Group organizations get), so we cannot have a ton of flyers there.  To that end, we have created a QR code that resolves to, which is where we are putting many links.  

To that end, we'd like your help figuring out what we have missed.  In order to keep the list of links as short and relevant as possible we have put as many aggregate links as we could, for example we link to instead of individual blogs, and we are only listing the major conferences with over 500 attendees expected.  The links at as of the time of this blog writing are:

- MySQL sessions at OOW

- Planet MySQL

- (docs, etc)


- MySQL User Groups ( list) - so if you have a user group, make sure to update the forge page!

- Percona Live 2012 Conference & Expo

- MySQL videos on YouTube

- IOUG MySQL Council

- OurSQL Podcast Blog

- OurSQL iTunes link

- MySQL Experts podcast

- Book: MySQL Administrator's bible*

- Book: High Performance MySQL

- Book: Expert PHP/MySQL

- Book: MySQL High Availability


If you think of a link we should put on there, please comment below.


For what it's worth, the paper we will have will be:

- The current day's schedule

- A flyer about Percona Live 2012 MySQL Conference & Expo

- A poster of the QR code and a few small paper slips with the QR code

- IOUG MySQL Council business cards

And even that is stretching it, as there will be a laptop at the kiosk provided by Oracle and the kiosk is 24 inches x 24 inches, about 61 centimeters x 61 centimeters.
* Note that I have ordered the books with the MySQL Administrator's Bible first because it's for beginner/intermediate users, whereas High Performance MySQL is for intermediate/advanced users.

Beware: Default charset for mysqldump is utf8, regardless of server default charset


I ran into this issue a while ago, and was reminded of it again recently.  mysqldump uses a default charset of utf8, even when the default charset of the server is set differently.  Why does this matter?

The problem exists more in the fact that if you have string data that is in latin1 format, you are allowed to put in non-Latin characters. This can lead to lost data, especially when upgrading a major series (e.g. 5.0 to 5.1 or 5.1 to 5.5), because you're supposed to export and import the data.

Also, when importing a backup of an InnoDB table, if there is an error with one of the parts of the INSERT, the whole INSERT statement rolls back.  I have experienced major data loss because the garbled characters cause an error when INSERTed, and it causes perfectly fine data *not* to import because they're in the same INSERT statement as the garbled data.

For example:

First, set variables such on a MySQL server (5.0 or 5.1, I haven't tested on 5.5):

mysql> show global variables like '%char%';


| Variable_name            | Value                      |


| character_set_client     | latin1                     | 

| character_set_connection | latin1                     | 

| character_set_database   | latin1                     | 

| character_set_filesystem | binary                     | 

| character_set_results    | latin1                     | 

| character_set_server     | latin1                     | 

| character_set_system     | utf8                       | 

| character_sets_dir       | /usr/share/mysql/charsets/ | 


8 rows in set (0.00 sec)


Then create these tables with data:


CREATE TABLE `test_utf8` (

  `kwid` int(10) unsigned NOT NULL default '0',

  `keyword` varchar(80) NOT NULL default ''






CREATE TABLE `test_latin1` (

  `kwid` int(10) unsigned NOT NULL default '0',

  `keyword` varchar(80) NOT NULL default ''



INSERT INTO `test_latin1` VALUES



Now compare:

mysqldump test > test_export_utf8.sql

mysqldump --default-character-set=latin1 test > test_export_latin1.sql


Note that the test export with the default character set of utf8 has mojibake whereas the export with latin1 does not.


So be *extremely* careful when using mysqldump - whether for backups or while upgrading.  You can checksum your data before and after  an export/import with mysqldump to be sure that your data is the same.


Syndicate content
Website by Digital Loom