Next week, Monday January 10th 2011, MySQL's Bill Papp will be speaking about MySQL 5.5. This is not to be missed! Pizza and soda will be served, so please RSVP if you plan on attending. Attendance is free.
If you are near the Boston area, you can RSVP for the meeting at http://www.meetup.com/mysqlbos/calendar/15169986/. That URL also has all the details about the location (an MIT classroom). To RSVP anonymously, please login to the Meetup site with the e-mail address "admin at sheeri dot com" and the password "guest".
Here is the description Bill gives of this presentation:
Now, more than ever, databases have to scale linearly to handle high spikes in usage, deliver 24 x 7 continuous availability and low latency. At the same time, many workloads need to maintain the data consistency delivered by transactional integrity.
MySQL Cluster is the industry's leading real-time, write-scalable transactional database, combining 99.999% availability with the low TCO of open source. With a distributed shared-nothing architecture and no single point of failure, MySQL Cluster can scale linearly to meet the unprecedented demands of next generation web, e-commerce, and telecommunications / network services.
Typical use cases in web services include:
* User profile management for service authentication and authorization
* eCommerce (customer and order data, payment processing, product inventory tracking, etc.)
* Web Session Management
* Social Networks (user profiles, authentication, real-time status updat! es, etc.)
Today it was announced that MySQL 5.5 is GA. This is very exciting news; we have not had a GA announcement since MySQL 5.1 two years ago. There are tons of new features, mostly performance enhancements, that I am looking forward to. Here is why I am excited about MySQL 5.5:
First and foremost: the fact that it exists in a GA release. [Note: a MySQL/Sun/Oracle timeline follows. To skip this and go straight to the actual features in MySQL 5.5 I am excited about, click here.] Sun bought MySQL in January 2008, and said it was not going to change anything in the first year. MySQL 5.1 was released in November of 2008, so presumably that was on target with what MySQL AB would have done had they not been bought out by Sun.
A few months later, in April 2009, a "preview" version of MySQL 5.4 was announced. Simultaneously, it was announced that Oracle was buying Sun (and thus, was acquiring MySQL). This acquisition did not actually happen until January 2010, however, so all of the development in 2009 was still under Sun.
Then in April 2010, a year after the MySQL 5.4 announcement, we were again teased, this time with an announcement of a MySQL 5.5 beta version. We had heard this song and dance just a year ago, and all we knew at that point was that we had absolutely no confidence in what the version number of the next GA release would actually be. The official word was "MySQL 5.5" but I was feeling "I'll believe it when I see it".
So I am particularly happy to see Oracle doing good work in the MySQL sphere. As much as Oracle has a lot to learn about the open source road and has some tough times ahead of them in that regards, I am glad that at least, for now, Oracle is getting MySQL right.
Better InnoDB I/O usage - One of the biggest advantages of the Percona build is getting more background threads. In MySQL 5.1 and earlier, there is one background thread each for reading and writing. In MySQL 5.5, you can configure how many you want. The default is 4 of each, the manual says they are numeric but does not show a range of what the permitted values are (though I believe the range is 1-64, based on the Percona build documentation). The variable names are innod_read_io_threads and innodb_write_io_threads; they are global and static, so changes require a reboot.
innodb_io_capacity allows you to change the number of iops (i/o per second). Previously the value was compiled in at 100 iops. In MySQL 5.5 it defaults to 200 but it can be changed. The manual shows the minimum value is 100, but does not have a max value (the Percona build documentation shows a max value of 999999999). Global and dynamic.
The default storage engine for new tables is InnoDB rather than MyISAM.
There is support for an interface for an optional semisynchronous replication plugin. Semisynchronous replication means that a master will not consider a commit completed until at least one slave shows that it has logged the transaction.
The performance schema, a new feature for monitoring MySQL Server execution at a low level. The performance_schema is off by default and is a global static variable, meaning it needs a restart to be turned on. The manual states that the performance_schema is low overhead, though I am curious to see exactly what the overhead is.
What excites me about the performance schema is that up until now almost all of the internal monitoring for MySQL is counters showing very basic information. The exceptions to this (for me) are some of the numbers in SHOW ENGINE INNODB STATUS and Jeremy Cole's* SHOW PROFILE/S syntax. The performance schema has crazy levels of detail similar to SHOW PROFILE/S, but on a global server level. This is definitely going to challenge me and help me be a better DBA!
Partitioning is much improved. This is not surprising since partitioning was new in 5.1, and not as feature-rich as I would have liked it to be (although even in the 5.1 tree it has improved greatly). Specifically they have added two new partitioning types RANGE COLUMNS and LIST COLUMNS. These enable optimized partitioning of ranges and lists based on dates, datetimes, and strings. The partitioning key for ranges/lists can be based on up to 16 fields, and there is partition pruning optimization when the different columns in the partitioning key are compared to constants.
Pluggable MySQL Authentication - MySQL 5.5 supports plugin development for external authentication. I would be much more excited about this if there were actually plugins developed for authentication via LDAP, Kerberos, PAM and Active Directory. I am fairly sure that given the strength of the MySQL community plugins will be developed fairly quickly, though.
Proxy users allow one person to have the permissions of another. This is important to be able to grant permissions to externally authorized users. Part of me thinks that the MySQL ecosystem will come up with clever hacks using this new feature.
One note: Lots of deprecated variables have been removed, so check your error logs if MySQL won’t start up after upgrade. For example, table_type has been removed in favor of storage_engine. Similarly the TYPE option for CREATE/ALTER table will not work any more, which might mean your internal scripts might not work. LOAD TABLE....FROM MASTER is gone (it was MyISAM only anyway) and BACKUP/RESTORE TABLE is gone (very very old).
The old, non-secure “put the replication parameters in the my.cnf” days are gone because --master-host, -user, -password, -port etc are removed after being deprecated for a while.
Some tools also have removed deprecated features, including mysqlbinlog, mysqldump and mysqld_multi.
All in all I am extremely excited about this release. I wonder if a lot of the improved performance is adding in the features implemented in the Percona build; I also wonder if Percona is going to come out with a MySQL 5.5 build.
* Apparently, Jeremy Cole is now a MySQL Database Architect for Twitter (starting Nov 15 2010).
At the November 2010 Boston User Group, Jacob Nikom presented his solution for database change management using MySQL. This system is designed to keep a repository of tables and data that an application needs to function. For example, an application might need at least one administrative user, or a table with a list of countries. This is NOT a tool for database configuration management.
We recently had a client come to us, and ask us to improve their MySQL monitoring in Zabbix. So, we did. The approach we took was to port the MySQL script from the superb mysql-cacti-templates project to work with Zabbix. This works out well, because Zabbix is like cacti and nagios combined, and, what we wound up getting, are some templates that can alert us when InnoDB uncheckpointed bytes starts climbing rapidly.
In addition to the above benefit, you also get every MySQL graph that comes from the mysql-cacti-templates project making Zabbix with appaloosa-zabbix-templates* a first-class replacement for cacti when it comes to MySQL trending.
Finally, the same client is also generously donating time to create templates for Memcached/Membase and Bind9. So, look forward to more out of this project as we go forward into next year.
* Did you notice we really like our horses, here at PalominoDB?
JMX, if you don't know, is one of the things that makes Java really great. It allows you to write your Java app and instrument it as if instrumentation was free. It's not free, of course, but it's so low cost, that in Cassandra, there's a JMX "bean" that counts every single read and write to a column family. And you can have this - in production. However, it's generally somewhat difficult to get data from JMX to another programming language. Depending on how the remote side (the JMX agent) is configured, it could be expecting to exchange raw Java objects. Not great for interoperability. In our particular case, we wanted to get data from Cassandra and into Cacti. We tried several different generic JMX clients with varying levels of success, however, none of them were fast.
Most of them could only list the attributes for one bean at a time, and required spawning an entire JVM for each request. Because we knew we were going to be collecting nearly every JMX statistic in a short amount of time, this wouldn't cut it.
Our solution was to create a new client application specifically for doing this kind of monitoring. Unimaginatively, we called it 'JMXClient', and it has a few distinguishing features:
It's fast. 993 attributes can be retrieved in 1 second. Other solutions we saw did 10 attributes in 1.5 or even 2 seconds.
It has no dependencies other than a Java 1.5 or 1.6 install. Some solutions required external Jar files, or even ant!
It outputs results in PHP's native serialization format. All other solutions assumed a human would be consuming the data.
For us, all three of these items were a big win. Future plans include adding or replacing the PHP serialization output with JSON or some other cross-platform/langauge format.
I've been working a lot with xtradb versions of mysql over the last year, and more and more have begun to attempt to leverage the information_schema instrumentation that has been put in place.
Today, I went into a system with this install (admittedly a very recent one)
Server version: 5.1.51-rel11.5-log Percona Server (GPL), 11.5, Revision 132
Having been considering upgrading RAM on the server, I went to look at how the innodb_buffer_pool was currently being used. The information_schema tables seem to have changed dramatically, particularly around the innodb_buffer_pool related tables found here.
At this point, I'm at a loss of how to get information on which tables/indexes are using space (and how much) in the buffer pool. I had used INNODB_BUFFER_POOL_PAGES_INDEX to do this in the past. Hopefully documentation will be forthcoming on the new table structures. Admittedly, I haven't dug too deeply into the new structures (since they are not documented). I'm curious if anyone else has worked with the newer table structures:
MySQL events have been in GA for 2 years. Events are not used as much as I would like -- often organizations will have cron jobs or scheduled tasks to perform ETL, maintenance and small aggregation tasks. For example, a website that shows how many users are logged in might update a "cache table" with a count of active sessions every 5 minutes.
Events are great because they are a part of the database. They only run when the database is running, and they are backed up with the database. There is no need to store a password in a file or in a cron listing.
Events and replication can be a tricky matter. Events can change data, and data changes are replicated. In the example of an event that updates a table with a count of active sessions every 5 minutes, let's assume that the event itself would run on a primary master (write) server. All slaves (including a secondary master) would receive the table update.
The tricky part is that CREATE EVENT statements are replicated. There are three ways to avoid an event that changes data running on more than just the master:
0) Do not replicate the CREATE EVENT, or replicate it but then delete the event on any slaves. I do not like this approach because if you promote one of the slaves to be a master, the event will not be in place.
1) Replicate the CREATE EVENT, knowing that the SLAVESIDE_DISABLED bit will be set even if you do not specify DISABLE ON SLAVE in the CREATE statement. If you promote a slave to be a master, all you need to do is ALTER EVENT event_name ENABLED for each event that needs to be turned on. This is the method I prefer, even though it still requires knowledge to turn on the events when a slave is promoted, and turn off events on the old master.
2) Replicate the CREATE EVENT, and enable the event right away, but set the event scheduler off. This is easier if there are lots of events, because all that needs to be done when a slave is promoted to a master is:
mysql> SET GLOBAL event_scheduler=ON;
However, this means that you cannot have any events running on the slave itself - it is either all the events running, or not.
Part 2 will discuss some specific issues I have run into with events and replication.
I was asked "What's the deal with Oracle removing InnoDB?" I had not heard this, so I did some research. It took less than 5 minutes to figure out what happened, and it was easy to see where the confusion is.
On the MySQL products page at http://mysql.com/products/ the matrix of MySQL editions includes "MySQL Classic" which is free, "MySQL Standard" which costs $2k per year, "MySQL Enterprise" which costs $5k per year and "MySQL Cluster Carrier Grade" which costs $10k per year.
Indeed, the "MySQL Classic" does not include InnoDB. What happened was that folks assumed that, because it was free, it was the MySQL Community edition we all know and love.
This is not true. How do I know? Because just above the matrix is a set of links to each edition, and if you click the "MySQL Classic" link you get to http://mysql.com/products/classic/ which explains "MySQL Classic Edition is the ideal embedded database for ISVs, OEMs and VARs developing read-intensive applications using the MyISAM storage engine."
So I'm creating a new cluster, and after setting up I needed to get my tokens. As we're told in http://wiki.apache.org/cassandra/Operations:
Using a strong hash function means RandomPartitioner keys will, on average, be evenly spread across the Token space, but you can still have imbalances if your Tokens do not divide up the range evenly, so you should specify InitialToken to your first nodes as i * (2**127 / N) for i = 0 .. N-1. In Cassandra 0.7, you should specify initial_token in cassandra.yaml.
Here's a nice simple code snippet to figure out your RandomPartitioner tokens based on the size of your cluster: