Blog

Continuous Deployment Architecture For Ruby on Rails

 

We have seen agile development become more popular in recent years thanks in part to the evolution of continuous integration environments like Ruby on Rails.  These development frameworks leverage quick testing and the ability to easily deploy over clusters. As deployments happen more often, we look for ways to minimize the disruption to users. Cluster architecture already contains the components we need, with multiple servers for each role, allowing us to update a subset of the system while the rest serve the business.

We recently worked with the DevOps team at one of our clients to develop an architecture that allows them to run rolling changes through the cluster using the Ruby on Rails framework. There are two principal components of change: application code on the application servers and database structure on the database servers. The application servers can easily be targeted specifically through deploy.rb. The database side of things, however, is a bit more complicated.

In order to have zero downtime, half the application servers are taken offline and updated, then the pair of master/slave standby databases have the DDLs applied to them. Traffic is then switched to these servers.

HAProxy is put in between the application and the MySQL databaseservers to act as a router for database traffic (see http://palominodb.com/blog/2011/12/01/using-haproxy-mysql-failovers for specifics), providing the ability to flip the active and standby roles of the two server pairs (as well as providing a High Availability solution). Since there are slave relationships, we needed to be able to pause replication through the deployment application (Capistrano).  We were able to accomplish this by adding a file, deployrake.util, to the Rake subsystem under lib/tasks:

namespace :deployutil do

 desc 'Checks the replication status of the primary database'

 task :replication_test => :environment do

   # find the state of replication

    mysql_res = ActiveRecord::Base.connection.execute("SHOW SLAVE STATUS")

   mysql_res.each_hash do |row|

     if row['Slave_IO_Running'] == "Yes" and row['Slave_SQL_Running']

== "Yes" and row['Seconds_Behind_Master'].to_s == "0"

       puts "ReplicationGood"

     elsif row['Seconds_Behind_Master'].blank?

       puts "ReplicationBroken"

     else

       puts "ReplicationBehind_" + row['Seconds_Behind_Master'].to_s

     end

   end

   mysql_res.free

 end

 

 task :start_slave => :environment do

   ActiveRecord::Base.connection.execute("START SLAVE")

 end

 task :stop_slave => :environment do

   ActiveRecord::Base.connection.execute("STOP SLAVE")

 end

 

We can then create tasks in deploy.rb to call these

 desc "shows the current passive slave replication status"

 task :get_slave_replication_status, :roles => :cron do

   dbrails_env = fetch(:rails_env) + '_passiveslave'

   # find the state of replication

   set :slave_replication_status, capture("cd #{latest_release} ;

RAILS_ENV=#{dbrails_env} rake deployutil:replication_test").chomp

 end

 

 desc "stop passive slave replication"

 task :stop_passiveslave_repl, :roles => :cron do

   dbrails_env = fetch(:rails_env) + '_passiveslave'

   run "cd #{latest_release} ; RAILS_ENV=#{dbrails_env} rake

deployutil:stop_slave"

 end

 etc....

We also want to be able to limit changes to specific databases so that the changes won't go into the bin logs and propagate when the slaves are turned back on.  See http://palominodb.com/blog/2011/11/21/rails-and-database-session-variables for details on how to do this through an extension of ActiveRecord.  A word of caution here: setting sql_log_bin=0 to skip logging these changes will invalidate using the binlogs for point in time recovery.

You will need a full backup after the change.

Master/Slave Replication In Limited Bandwidth Scenarios

Database replication is quite an important component of the database architectures of many of our clients. Managed properly, it offers real-time redundancy and fault-tolerance, and allows for increased scalability. When replication goes wrong, however, it can turn into a morass of data conflicts, lag and strange little mysteries to distract your DBA team.

PalominoDB recently helped a client out of one of these strange little replication mysteries, and we document it here to help you avoid getting stuck in the same sort of situation.

The symptoms: Our slave database (which we’ll call DB_Slave) was lagging. The logs revealed that the value of Seconds_Behind_Master was jumping from 0 to a random high value, and then back to 0.

What we knew: DB_Slave connects to the master (which we’ll call DB_Master), and DB_Master sends binlog events to DB_Slave.

Because this data is pushed from the master rather than pulled from the slave, there was a small window where, in theory, a replication channel could be broken, and DB_Slave wouldn’t notice until "slave-net-timeout" seconds (with a default of 3600) passed.

We also knew that Seconds_Behind_Master is calculated not as the difference between the current  datetime and the datetime of the binlog event being processed, but as the difference between the time in master binlog and current executed relay log. That means that if DB_Master is not sending binlog events to DB_Slave (for example, due to a network issue), the slave could be lagging behind the master but not be aware of it at all.

Further, if DB_Slave then received some binlog events from DB_Master, and realized at that point that it was lagging, it could still fail to notice if DB_Master once again stopped pushing binlog events. 

Investigation: Our theory that there was in fact a network issue was supported by some evidence from the logs. We saw that SHOW SLAVE STATUS didn't show any progress in Master_Log_File and Read_Master_Log_Pos : that is, DB_Slave was not getting any data from DB_Master. 

We also found this in the processlist of DB_Master:

104062295       slave_user      [IP Address]:10124     NULL    Binlog Dump     1140    Writing to net  NULL

The status "Writing to net" means that DB_Master is sending (or trying to send) data to DB_Slave.

In a healthy replication channel, the status would be "Has sent all binlog to slave; waiting for binlog to be updated".

 

We suspected this might be an issue related to network bandwidth, but when we ran a series of STOP SLAVE / START SLAVE processes, we noticed that the number of threads in status “Writing to net” was increasing, with up to 20 concurrent open threads. Connections simply weren’t being closed. Further investigation revealed that those connections were staying in CLOSE_WAIT status.

Based on our observations we concluded that the limited network bandwidth was at fault, and we needed a solution to compress the transmission between the master and slave databases. Luckily MySQL provides a convenient parameter, which is dynamic and is set on the slave: slave_compressed_protocol=1

Here is how to implement it:

 

mysql> show global variables like 'slave_compressed_protocol';

+---------------------------+-------+

| Variable_name             | Value |

+---------------------------+-------+

| slave_compressed_protocol | OFF   |

+---------------------------+-------+

1 row in set (0.00 sec)

mysql> stop slave;

Query OK, 0 rows affected (0.25 sec)

 

mysql> set global slave_compressed_protocol=1;

Query OK, 0 rows affected (0.00 sec)

mysql> start slave;

Query OK, 0 rows affected (0.00 sec)
mysql> show global variables like 'slave_compressed_protocol';

 

+---------------------------+-------+

| Variable_name             | Value |

+---------------------------+-------+

| slave_compressed_protocol | ON    |

+-----------------------------------+

Resolution: We set slave_compressed_protocol=1 and restarted the slave. Replication started catching up at a very surprising speed. Even though it was catching up, we noticed the same behavior we’d noticed in the past: Seconds_Behind_Master was jumping from 0 to a random high value, and then back to 0, the io_thread was behind, and we didn’t see a high load on the server. 

As you can see in these graphs, network traffic did not increase much, but CPU usage and commands/sec had surprisingly high jumps.

In short, DB_Slave was processing more queries because the IO thread was receiving more data from DB_Master, thanks to compression. 

Live Blogging at MongoSV

Dwight Meriman, CEO of 10gen, speaks about the MongoDB community growing.The conference has doubled in size from 500 to 1100+ attendees.

Eliot Horowitz, CTO of 10gen, demos the MongoDB 2.2 Aggregation Framework. Simplifies aggregating data in MongoDB. He pulls in mongodb twitter feed to populate data and sums using: runCommand({aggregate: … })

The “aggregate” command in nightly builds tonight.

Cooper Bethea, Site Reliability Engineer, Foursquare, speaks on Experiences Deploying MongDB on AWS.

All data stored in MongoDB
8 production MongDB clusters
Two of the larger shards:
8 shards of users, 12 shards of check-ins.
Checkins: ~80 inserts/sec, ~2.5k ops/sec, 45/MB/s outbound at peak.
Users: ~250 updates/sec, ~4k ops/sec, 46MB/s outbound at peak
Only one unsharded cluster. Other fully sharded using replica sets.

All servers in EC2
mongoS is on mongoD instances
mongoCs are on three instances
mongoD working set contained in RAM
MongoD backing store: 4 EBS volumes with RAID0

Problem: fragmentaion leads to bloat
mongoD RAM footprints grows.
Data size, index size, storage size.

Solution: order replicaset by dataSize + indexSize, uptime DESC. --repair secondary nodes one at a time. Primary nodes require stepDown() which is more delicate.

Problem: EBS performance degrades
Symptoms: ioutil % on one volume > 90
qr/qw counts spike
fault rates > 10 in monostat
sometimes:  topless counts spike

Solution:
KILL IT! Stop mongoD process if secondary node, stepDown() + stop if primary.
Rebuild from scratch.

How long does it take? ~1 hour
Working set in RAM

Problem: fresh mongoD has not paged in all data
Solution: run queries
db.checkins.find({unused_key:1}).explain()

cat > /dev/null works too, unless your dataset size is larger then RAM.

Hashing Algorithm in MySQL PASSWORD()

Recently we had a question from a customer: what is the hashing algorithm implemented in PASSWORD() ?

The manual doesn't give a straight answer in any of these two pages:

http://dev.mysql.com/doc/refman/5.1/en/encryption-functions.html#function_password

http://dev.mysql.com/doc/refman/5.1/en/password-hashing.html

 

It is enough to dig a bit more to find the solution in http://forge.mysql.com/wiki/MySQL_Internals_ClientServer_Protocol#4.1_and_later that specifies "mysql.user.Password stores SHA1(SHA1(password))" .

 

Instead of blindly trusting the documentation (even if I believe it is correct), I did some tests and was confused by the first result:

mysql> SELECT PASSWORD("this_is_a_random_string") `pass`\G 

pass: *12E76A751EFA43A177049262A2EE36DA327D8E50

mysql> SELECT SHA1(SHA1("this_is_a_random_string")) `pass`\G 

pass: 9b653fd9fb63e1655786bfa3b3e00b0913dfc177

So it looked like SHA1(SHA1(password)) wasn't PASSWORD(password)), at least in this test.

The best documentation ever is the source code, so I read the source code and understood why my previous test was incorrect: the second SHA1() is applied to the binary data returned by the first SHA1() and not to its hex representation. Therefore in SQL I have to UNHEX() it before applying the second SHA1. In fact: 

mysql> SELECT SHA1(UNHEX(SHA1("this_is_a_random_string"))) `pass`\G 

pass: 12e76a751efa43a177049262a2ee36da327d8e50

 

So yes, I confirmed that mysql.user.password stores SHA1(SHA1(password)) . I also hope this post is useful to understand how MySQL implements PASSWORD().

Building libMemcached RPMs

A client running CentOS 5.4 Amazon EC2 instances needed the latest libMemcached version installed. With the inclusion of the "make rpm" target, libMemcached makes it easy to build the libMemcached RPMs by doing the following:

Spin up a new CentOS Amazon EC2 instance,

As root on the new instance:

yum install @development-tools
yum install fedora-packager
/usr/sbin/useradd makerpm

Now change to the makerpm user and build the RPMs:

su - makerpm
rpmdev-setuptree
tree
wget http://launchpad.net/libmemcached/1.0/1.0.2/+download/libmemcached-1.0.2.tar.gz
tar -zxf libmemcached-1.0.2.tar.gz
./configure && make rpm
find . -name '*rpm*'

References:
http://libmemcached.org
http://fedoraproject.org/wiki/How_to_create_an_RPM_package
https://launchpad.net/libmemcached/+download

Using HAProxy for MySQL failovers

There are a number of solutions available for MySQL architecture that provide automatic failover, but the vast majority involve potentially significant and complex changes to existing configurations. Fortunately, HAProxy can be leveraged for this purpose with a minimum of impact to the current system.  Alex Williams posted this clever solution a couple years ago in his blog.  Here we take a closer look at the details of implementing it into an already existing system.

HAProxy is a freeware load balancer, proxy, and high availability application for the TCP and HTTP protocols.  Since it was built mostly to handle web traffic, it has robust rule writing using HTTP components to check on the health of systems.  The key to Alex's solution was creating xinetd daemons on the database servers that send out HTTP messages.  The HTTP check to determine database statuses works like this:

  1. The HAProxy server sends HTTP requests to the database servers at configured intervals to specified ports
  2. The /etc/services file on the database servers maps those ports to services in their /etc/xinetd.d directory
  3. The services can call any specified script, so we build scripts that connect to the databases and checks for whatever conditions we choose.

The services then return an OK or a Service Unavailable response per the conditions.  Code for these scripts is in included in the Alex's article.

Our database configuration for this implementation is two pairs of Master-Slave databases in an Active-Passive relationship with Master-Master replication between the sets.  Siloing the passive Master-Slave provides a hot spare as well as continuous up-time during deployments provided we have a means of swapping the active/passive roles of each pair.  To accomplish the latter, we built two HAProxy configuration files, haproxy_pair1.cfg and haproxy_pair2.cfg, the only difference between the two being which Master-Slave pair is active.  Having the two files indicate which pair is active also allows immediate visibility of the current configuration.

Just as in Alex's sample configuration, our web application uses two DNS entries, appmaster and appslave to call writes and reads respectively.  The IPs for these addresses are then attached to our HAProxy server, allowing HAProxy to bind to them when it starts and then route them to the appropriate database server.

On our HAProxy server we then customized the /etc/init.d/haproxy script to handle an additional parameter of "flipactive" which provides us the capability to swap the database pairs:

flipactive() {
        # first detect which cfg file haproxy is using
        ps_out=`ps -ef |grep haproxy|grep cfg`
        # if pair2 then use pair1
        if [[ "$ps_out" =~ pair2 ]]  then
             # the -sf does a friendly reread of the config file
             /usr/local/sbin/haproxy -f /etc/haproxy/haproxy_pair1.cfg -p /var/run/haproxy.pid -st $(cat /var/run/haproxy.pid)
        # if pair1 then use pair2
        else
             /usr/local/sbin/haproxy -f /etc/haproxy/haproxy_pair2.cfg -p /var/run/haproxy.pid -st $(cat /var/run/haproxy.pid)
        fi
}

The rest of the configuration details are covered pretty well in Alex's article as well as in the HAProxy documentation.

We successfully developed and implemented this technique with the devops team at SlideShare last month to build rolling DDL scripting to multiple databases using Capistrano.  It allowed them to have explicit control over which database was being updated, thereby giving them the means necessary to update one database while other served the current application code.

Last Day at PalominoDB

I have been the Community Liaison and a Senior DBA at PalominoDB for 15 months, and doing remote DBA work for 4 years.  In that time I have learned that "consultant" need not be a dirty word, and that in the DBA world it is actually extremely valuable to have a remote DBA with lots of outside experience, and a team of remote DBAs for when your primary contact is sick or goes on holiday.

As with everything, there are downsides to remote database management.  Even though there is a lot of architecture experience among the remote DBAs I know, we are not often invited to architecture meetings.  This is because time is the unit of currency, and while sitting in an hour-long meeting to give 5 minutes of feedback can save hours down the road, it's hard to see that.  Many clients have gotten around this by having all DDL checked and performed by remote DBAs, and that helps a lot.

There is also no ownership - we can recommend solutions and technologies, but the client makes the actual decision about whether something needs to be done or not.  I look forward to actually owning architecture after 4 years of making "strong recommendations".

Since folks will ask, I have taken a job as a Senior DBA/Architect with Mozilla, starting Monday.  A former co-worker told me about the job; I was not particularly looking for anything, but I was intrigued.

I have said before that it is hard to find a good in-house DBA if you are not a huge company like Facebook or Google or Yahoo, and that is still true.  At Mozilla, they are 100% open and public about their ideas, and they do a lot of behind-the-scenes work.  Sound familiar?

They also allow their developers to develop on whatever platforms work best.  Their biggest database is their crash reporting database (and they do read it, so do submit your crashes).  They have MySQL, PostgreSQL, and MongoDB, and are starting to move some applications around, as developers are not always aware of what platforms will work best.  There is another DBA, so I will not be alone, but I expect to be just as challenged at Mozilla as I have been at PalominoDB and Pythian.

So, to keep up with PalominoDB, you can:

- like the PalominoDB page on Facebook

- follow @palominodb on Twitter

- connect with Laine on LinkedIn

- follow PalominoDB on LinkedIn

- continue to read PalominoDB's blog and Planet MySQL

 

To keep up with me, you can:

- follow @sheeri on Twitter

- read my blog and Planet MySQL

- connect with me on LinkedIn

- subscribe to the OurSQL podcast (details at http://www.oursql.com/?page_id=2)

Rails and Database Session Variables

Ruby's ActiveRecord provides a very robust means of manipulating database table structures especially in the context of automated deployments such as Capistrano.

When there is a master-slave database configuration, it may not always be prudent to perform DDL statements on the master and let those propagate through to the slave, especially in high-volume sites where Rails deployments may involve multiple migrations since replication lag may occur and present some significant problems.  In addition, when no downtime is specified, a DDL rollout may break the current application code, so a targeted deployment may be more prudent.

In MySQL, the solution would be to set the session variable SQL_LOG_BIN = 0 which causes subsequent statements not to get written to the binary log and therefore won't get replicated to slaves.  Unfortunately, the given connection options of ActiveRecord do not accommodate the setting of database options.  However, one way to accomplish setting this variable would be to have developers explicitly set it in the migration files:

class AddAccounts < ActiveRecord::Migration

   def self.up

     execute("set SESSION sql_log_bin=0")

     create_table :api_accounts do |t|

       t.column "user_id", :int

       t.column "name", :text

       t.column "account",:text

       t.column "created_at", :datetime

       t.column "updated_at", :datetime

     end

   end

   def self.down

     execute("set SESSION sql_log_bin=0")

     drop_table :api_accounts

  end

end

But since this would be a hassle for developers and is easily overlooked, we can leverage Capistrano's architecture to monkey patch the ActiveRecord::Migration class so that this variable is set whenever the migrations are invoked.  So we constructed a file, config/initializers/active_record.rb:

#opens the ActiveRecord::Migration class
#use alias_method to add functionality to the ActiveRecord::Migration.migrate method
class ActiveRecord::Migration
  class << self
     alias_method :migrate_conn, :migrate
     def migrate(direction)
        ActiveRecord::Base.connection_pool.with_connection do |conn|
        @connection = conn
        connection.execute("SET SQL_LOG_BIN=0")
        @connection = nil
        end
        migrate_conn(direction)
     end
  end
end
Note that the database user that invokes the migrations needs to have the SUPER privilege granted to it in order to be able to set this variable.

We successfully developed and implemented this technique with the devops team at SlideShare last month to build rolling DDL scripting to multiple databases using Capistrano.  It allowed them to have explicit control over which database was being updated, thereby giving them the means necessary to update one database while the other served the current application code.     

 

PostgreSQL 9.1 Presentation Slides available

Our resident Postgres expert Emanuel spoke at the Libre Software World Conference last week about PostgreSQL 9.1, and as the Community Liasion for PalominoDB for a few more days, I am posting the slides so that folks can download them.  They are in PPT format:

 

http://dev.palominodb.com/docs/LSWC_POstgresql_91_English.ppt

Enjoy, and remember that all of our presentation slides and videos are available at any time from our Community -> Presentations page.  

 

New England Database Summit

The New England Database Summit is an all day conference-style event where participants from the research community and industry in the New England area can come together to present ideas and discuss their research and experiences working with on data-related problems.  It is an academic conference with applications to real life, and includes any type of database.

The 5th annual NEDB will be held in Cambridge, MA MIT (in 32-123) on Friday, February 3, 2012.  Anyone who would like is welcome to present a poster (registration required), or submit a short paper for review.  We plan to accept 8--10 papers for presentation (15 minutes) at the meeting.   All posters will be accepted.

For more details, and to register and / or upload a paper, see:

http://db.csail.mit.edu/nedbday12/

Syndicate content
Website by Digital Loom