Liveblogging at OSCON Data: Drizzle, Virtualizing and Scaling MySQL for the Future

Brian Aker presents "Drizzle, Virtualizing and Scaling MySQL for the Future" at OSCon Data 2011 #drizzle


2005 MySQL 5.0 released - web developers wanted tons of features that were not in the release (making replication better for instance)

2008 Sun buys MySQL

2008 MySQL 6.0 is forked to become Drizzle

2009 Oracle buys Sun

2010 Drizzle developers leave Oracle

2011 First GA release, Drizzle7

MySQL's Architecture - monolithic kernel, not very modular, lots of interdependence.

Drizzle has a microkernel, which includes a listener, parser, optimizer, executioner, storage system, logging/error reporting.

Drizzle can accept SQL and http blog streaming, and memcached and gearman can easily talk to Drizzle.

Drizzle has tried to have no "gotchas"

- If you make a new field with NOT NULL, MySQL makes new values NULL.  Drizzle does not do this.

- No hacky ALTER TABLE

- Real datetime (64 bit), including microseconds

- IPV6 (apparently this is a strong reason for people switching, to support IPV6)

- No updates that complete halfway

- Default character set is UTF-8, default collation is utf8-general (charset in latin1 by default in MySQL, collation is latin1_swedish_ci - "case insensitive" by default)


- In MySQL, replication is kind of hacky [this is my summary and opinion, but it's basically what Brian said]

- Drizzle is Google Protocol Buffer Based

- Replicates row transformations

- Integrates with RabbitMQ, Cassandra, Memcached, Gearman -- right now.

DML and MySQL binary logs analog:

- DML is stored transactionally by delta in Drizzle

- InnoDB is already logging, no need to add another log for the binary log.  So it just logs DML to the transaction log.


- supports Drizzle, MySQL, SQLite

- Asynchronous

- BSD, so Lawyer-free

What else?

- No cost authentication (pam, ldap, htaccess, ...)

- Table functions (new data dictionary, including performance and thread information).  INFORMATION_SCHEMA in Drizzle is *exactly* what's specified in the SQL standard.

- Data types - native type for UUID, boolean, all known types (except SET, because it's broken by design)

- Assertions are in Drizzle, you can ask what the type of the result of combining multiple data types will be.

- About 80 conflicts in the Drizzle parser as opposed to about 200 in the MySQL parser

Roadmap - Drizzle7+

- Replication - faster than MySQL and also allows multiple masters.


Virtualizing a database gives you about a 40% performance hit.  How can costs be cut?  In MySQL 5.0 the Instance Manager was created to solve that but it hasn't really been worked on.  Drizzle has worked on virtualizing databases internally within Drizzle.

- So drizzle now has catalogs.  

- One catalog has its own set of users, its own schema with tables, etc.

- A catalog is its own sandbox; there is no syntax that allows you to connect from one catalog to another, so there's no security problems.  

- Cuts the 30/40% hit from virtualizing

- Single instance maintenance - only 1 OS and 1 database to configure, unlike VMs

    - Currently only one database configuration so there's one global config for shared memory such as innodb buffer pool, but that will change in the future.

- Still allows for I/O spread on SAN/NAS


In Drizzle 7.1 - Percona's xtrabackup supports Drizzle, and ships with drizzle.  xtrabackup supports full and partial backups, no locking, single solution for point-in-time recovery in a single solution.  Because transaction log is stored in database, replication is automatically consistent with the database.  Currently does not do incremental backups with the transaction logs but that's in the future.


- consolidates standard testing tasks, server/test management, reporting, REGRESSION TESTING

- extended architecture allows for complex testing scenarios

- pluggable - supports new testing tools

- randgen, sql-bench, crashme, sysbench, standard drizzle-test-run suite

- Keeping tools and testing configurations in-tree facilitates testing for everyone

- supported by SkySQL


Dynamic SQL/execute()

- New UTF-8 parser

- Being extended to allow for plugging in application servers.


>120 developers since day 1

avg 26-36 per month that commit


Bugs database -

Liveblogging at OSCON Data: MongoDB Schema Design

Dwight Merriman gives "MongoDB Schema Design" at OSCon Data 2011.




relational / document-oriented

database / database

table / collection

row / JSON (BSON) document

index / index

column / field (dynamic/not predeclared)

SQL / Mongo query language (JSON)

Join / Embedding & Linking

Primary Key / _id field


Schema design is coupled with what you want to do:

- Dynamic queries

- Secondary indexes

- Atomic updates

- Map Reduce


- no joins

- atomic operations are at the single document level only

- types of queries/updates to do

- sharding

- speed


This is the commandline mongo interface but all this can be done in any (modern) programming language.

post = {author: "Herge"

date: new Date(),

text: "Destination Moon",

tags: ["comic", "adventure"]}

> db.posts.insert(post)

"posts" is the collection name.  Documents are analogous to rows but can be more complex.  Documents for one collection are grouped together.

> db.posts.find()

{ _id: ObjectId("4c4ba5c0672...."),


author: "Herge"

date: new Date(),

text: "Destination Moon",

tags: ["comic", "adventure"]}

_id must exist and must be unique -- if you don't create an _id, one will be made for you, 12 bytes BSON, shorter than a normal UUID but that's OK because you don't need it to be unique globally, just on this db cluster.

Secondary index, on "author":

> db.posts.ensureIndex({author: 1}) -- "1" means ascending, -1 is descending

> db.posts.find({author: 'Herge'}).explain() -- shows you the explain plan


Multi-key indexes

//build an index on the "tags" array

> db.posts.ensureIndex({tags: 1})

Arrays are exploded and every element of the array will be indexed, and added separately to the B-tree data structure of the index.

> db.posts.find){tags: 'comic'})

MongoDB assumes, when you query an array, that you mean you're looking for an array item that matches.


Query operators

Conditional operators:


$ne, $in, $nin, $mod, $all, $size, $exists, $type, $lt, $lte, $gt, $gte











Update operators:

$set, $inc, $push, $pop, $pull, $pushAll, $pullAll

Extending the Schema:

new_comment = {Author: "Kyle",

date: new Date(),

text: "great book!"

votes: 5}

db.posts.update( text: "Destination Moon" }, -- this is the WHERE filter

{ '$push': {comments: new_comment}, -- do this

'$inc': {comments_count: 1}}) -- and do this

If you push the comments array without it being there, it will create it without a problem.


> db.posts.ensureIndex({"":1 })

> db.posts.find({"Kyle"})

> db.posts.find({comments.text:"good book"})

The 'dot' operator


Find all posts with >50 comments,
> db.posts.findIndex({comments.votes: {$gt: 50}})
Not as robust as all the operators in SQL, but it's pretty good, and more concise than SQL.  Over time more expressions will be added.
Find all posts with >50 comments, order by author ascending
> db.posts.findIndex({comments.votes: {$gt: 50}}).sort(author: 1)
No functional indexes (indexes on functions of fields)
If you add an index to a non-existent field, it returns NULL (which is necessary because not all documents have the same fields).
From a schema design standpoint, the point of MongoDB is to make the documents rich.  He puts up an example of a sales order, with many line items, an address field that has name, street, zip, cc field that has number, exp date.
There is model inheritance, for instance if you have
> db.shapes.find()
{_id: 1, type "circle", area: 3.14, radius: 1}
{_id: 2, type "square", area: 4, d: 2}
{_id: 3, type "rect", area: 10, length: 5, width: 2}
All shapes have area, but the other dimensions are different based on the shape.
> db.shapes.find ({radius: {$gt: 0}})
-- automatically finds only circles.
Note that this avoids the need to join for 1:many and many:many relationships, as in relational model.
That was embedding, now let's talk about linking.
- done client-side


So for instance, a one to many relationship might look like this -- in addition to the collection for posts, a collection for authors with author info:
// collection authors
{ _id: "Herge"
email: "",
karma: 14.142 
> var p = db.posts.findOne()
> var author = db.authors.findOne({})
> print(
If it's a "contains" relationship you want to embed
If you need more flexibility than that, link
Rich documents are usually easy to query


Rich documents are great for performance


Rich documents give more atomicity capability
account: "abc",
debits: 21,
credits: 11
> db.stuff.update({account:'abc'},
Caching is based on 4k pages, so if you have very small documents, that can be a problem if you are pulling from many collections.
Trees in MongoDB:
{ comments: [
   { author: "Kyle", text: "...",
     replies: [
            { author: "Fred", text: "...",
                replies: []}
Mongo doesn't search recursively so while this is great for display, not great for search.
> t = db.mytree;
> t.find()
{ "_id" : "a" }
{ "_id" : "b", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "f", "ancestors" : [ "a", "e" ], "parent" : "e" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }
> t.ensureIndex( { ancestors : 1 } )
> // find all descendents of b:
> t.find( { ancestors : 'b' })
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }
> // get all ancestors of f:
> anc = db.mytree.findOne({_id:'f'}).ancestors
[ "a", "e" ]
> db.mytree.find( { _id : { $in : anc } } )
{ "_id" : "a" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
Limit of document size is 16 Mb per document.  Was 4 Mb, will keep increasing probably based on Moore's Law.  This is arbitrary just as a safety measure.
BSON is not compressed due to wanting to scan quickly.
Can do Queueing too, use the findAndModify method to find the highest priority job and mark as in-progress - see
The operation is fast, however the entire document is locked -- not ideal, but concurrency is getting better and better in MongoDB.
MongoDB tries to do update in place, if you are adding to the document such that it does not fit in the allocation unit, it has to be deleted and reinserted.  This is expensive, hence the allocation unit - an adaptive padding factor based on the collection unit.








Video: Henrik Ingo's "Buildbot & EC2: how to qa and package your db product"

And another video from OpenDBCamp is online....Today it is from Henrik Ingo - "Buildbot & EC2: how to qa and package your db product".

I cannot seem to find the slides Henrik used, but the video is now online at

By the way, there are plans to video *all* of the sessions at OSCon Data next week (MySQL and otherwise), which is what made me look in my "video to do" folder earlier this week and realize that I had not yet put up all the OpenDBCamp videos yet!

Video: High Performance Search with BlackRay Data Engine

I realized yesterday that I never did finish putting up the videos from this year's Open Database Camp back in May, so I'm working on finishing that in the next few weeks.

Today I put up High Performance Search with BlackRay Data Engine - Felix Schupp.  The slides are on the web at, and the video is on YouTube at YouTube video

Liveblogging: A dozen databases in 45 minutes


Actually, the slide is 12 databases in 25 minutes (and 20 minutes of theory) by Eric Redmond (@inviite).  

Complex data:  A lot of data isn't really complex, it's just modeled in a complex way.

"Complexity is a symptom of confusion, not a cause" Jeff Hawkins.



  Linear Scalability

  Ability to be Distributed

  Low Latency



  Not NoSQL


ACID (transaction-based)

  Atomic - Transactions are "all or nothing"

  Consistent - system data will have integrity

  Isolated - Transactions can't see each other

  Durability - changes aren't lost


BASE (request based)

  Basically Available

  Soft state

  Eventual consistency

Redmond says: ACID is negative (avoid negative things), BASE is mostly positive, things will be good, not great.

CAP theorem - it's a fact :D

Consistent, Available, Partition-tolerant web services.  "It is impossible to reliably provide atomic, consistent data when there are partitions in the network.  It is feasible, however, to achieve

Note that "consistent" is not the same as "consistent" in ACID, it's more like Atomicity.

Strong consistency: when an update completes, subsequent access returns the new result.  [to my mind this is actually durability]

Weak consistency - eventual consistency

"Correct consistency" - is the most important part.  DNS, for example, is eventual consistency.

Common patterns:


 CouchDB has an amazing ability to do this, Mongo is also good but not as good.

- copying data amongst nodes in a distributed database.  Lazy (optimistic) replication, gossip (nodes communicate to stay in sync). - master/slave (mongo)

- master/master (riak, couch)

  - vector clocks (keep track of write order per client

  - mvcc (mysql) 


  N - Nodes to write to (per bucket)

  R - Nodes read from before success

  W - Nodes written to before success

Amazon Dynamo does this (Cassandra and Riak do this) - supports both CP and AP in one db (from the CAP theorem)

Consistent Hashing

Balance your servers, and when you hash your keys, if a server goes down or is added you don't have to rebalance ALL nodes, just some % of them.



Relational Models:

"Nothing beats relational databases for raw queryability."  The tradeoff -- you have to structure your data and tell the system how it is structured.

PostgreSQL (full featured) -,,

MySQL (lighter) -,

Drizzle (lightest) -


Bigtable/Columnar Style


What makes it columnar?  well, a primary key is really a row key, and then you have column families, which are columns, stored together.  (each column's values are stored together as opposed to the row being stored together.)  You can set expiry for a column family too, after which the data expires (which is why it's great for Google).

HBase - - Google's BigTable implementation, which was born of Hadoop (Java mapreduce engine).  If you want to use HBase in production, use Thrift ( which Cassandra also uses). This is CP, but configurable to AP.  Does sequential reads and column versioning, strong but flexible columnar schema.

Cassandra - hybrid.  Node architecture like dynamo - data structure like BigTable w/column families - - Good for hundreds of nodes in the same data center, if there is more than that or different data centers, use HBase (that's what Digg and Facebook are running into).  In cassandra you set up your schemas with an XML file, not with DDL.  Benefits - sequential reads of ordered keys, also has versioning.  It's AP, configurable to CP.


Documentation Datastores:

MongoDB (AP focused - master/slave) - created to be huge (huMONGous).  Made to be partitioned, distributed, needed ad hoc queries.  Wasn't built to be durable.



Not made to be distributed, originally, was meant to be very durable.  AP focused (master/master) (clustering)

MapReduce in Mongo is an ad hoc query, comfortable for relational db ppl.  In CouchDB, you make views and then request data from those views.  

Riak - The most "architecturally cool" database out there.  It's a dynamo implementation that is purely REST based.  It's a key-value store, but it's not descriptive enough -- it has map-reduce built in, metadata and links you can walk.  You can store ANYTHING in riak -- not just text.  example: getting a JPG file from the web and putting it as the value for the key "firefox.jpg".  Neat demo.

Riak has a ring, eventual consistency, can pull nodes in and take nodes out, without having to invalidate all the ids.  It has quorum consistency, which blows Eric's mind, but we didn't have 


Key/value stores

memcached - don't use it

Kyoto Cabinet - don't use it

Redis - use it - - it can handle lists, hashes, can intersect the value of 2 keys (such as person and pet, to find out who owns which set).


Graph datastores - you walk the graph instead of querying or doing mapreduce.


FlockDB  - distributed, "unless you're twitter, you don't need to use it".  It's not really possible to distribute a graph database, you can't walk it and do node traversals, you can just walk edges (you can do friends, but not friends of friends, etc).

Slides are available at



Open Source Bridge, day 1 keynote

Hacker Lounge - open 22 hours a day (closed between 6 and 8 fit cleaning). Android app Session notes link on each page, takes you to thewiki. #osb11 is the official hashtag Friday is unconference (I won't be here on Friday). What is open source citizenship? Nominate people for open spice citizenship awards by the registration desk. Now the keynote, hacking for freedom: join the doocracy by Peter Fein. Hacker, hacktivist, names are important, names have power. Peter uses the word "hack" meaning doing something clever with technology. "if we cannot see, we cannot act." Do-ocracy. You. Just. Do. Telecomix ( built mirrors to blocked sites, help folks tweet. They are yin to anonymous' yang - they build and facilitate communication, not destroy and take down sites.. They go lower tech when they need to also, setting up fax and reverse fax lines for example. The last 6-8months have shown how dramatic the Internet can be for freedom fighting. Egypt has pulled fiber optic cable to censor the 'net, but in the Western world we have censorship and it's more subtle. It's not just wikileaks, there is a PROTECT-IP act going through US Congress that would allow Homeland Security to seize domain names under the guise of copyright issues. Communicate organize resist, then we become subjects. "If we can't communicate, we can't organize. If we can't organize, we can't resist. If we can't resist, we become subjects [not citizens]." We are seeing a sort of info war of nation-states vs the internet. "the Internet didn't cause the protests in the Middle East [North Africa, really], but it did facilitate them.". For me, though, Napster and Limewire did not cause piracy, they just facilitated piracy, but both companies no longer exist..... Peter introduces the concept of a "disorganization" and different tactics to organize volunteers. Radical openness - Anonymous chat rooms can be found by searching Google. Adhocracy - put a team together todo a certain task. At the end of the day, what matters is what you get done,and that is the core principle of the doocracy. He points to Critical Mass as a disorganization - there's no hierarchy, no centralized leadership, people just go out and do. He points to Burning Man as another one, and Open Source Bridge as yet another. People show up, they setup stuff they ask what they can do to help. The he plays a clip from Lawrence Lessig speaking at Oscon 2002 about copyright: "If you don't do something now...this freedom will be taken away.....If you can't fight for your freedom, you don't deserve it." --------- Overall, the keynote was good, but for me, was not particularly informational, inspirational, nor entertaining. I enjoyed it, and it reminded me of a lot of stuff that has been happening, but nothing ideally had forgotten. Maybe that's because I already subscribe to the doocracy way? I can see how it might spark a non-doer to go out and do, but that was not the effect it had on me. An interesting start to a conference that has less in the way of mysql content, and more of the "soft" topics, which really are important to me.

SkySQL Reference Architecture and What's New in MySQL 5.5 videos

If you missed Percona Live yesterday, you missed an awesome 1-day conference.  However, to solace you, here are a few videos from Open Database Camp:

SkySQL Reference Architecture - by Kaj Arnö of SkySQL

What's New in MySQL 5.5 - by Giuseppe Maxia of Continuent - slides

Enjoy, and look forward to more information from Percona Live next week!  If you are in Europe, get ready for Percona Live in London in October, exact dates are forthcoming.

Keeping a Slave in Sync and Advanced Replication for the Masses videos

Tomorrow is Percona Live, New York City's second* MySQL Conference, at which I will be speaking at 3 pm in the Cafe room about a Nagios plugin that efficiently allows arbitrary calculations with any number of system or status variables as well as caching, like the well-known cacti templates.

To get ready for the day, I have posted two new videos from OpenDBCamp earlier this month in Sardinia, Italy:

Keeping a MySQL Slave in Sync with mk-table-checksum and mk-table-sync, presented by me (Sheeri Cabral) can be viewed on YouTube, and the PDF slides can be downloaded.

Advanced Replication for the Masses, delivered by Giuseppe Maxia, can also be viewed on YouTube.

I will also post more videos tomorrow, for those that cannot be at Percona Live....for those that will, Laine and I will see you there!

* (I corrected Percona last week that the first NYC MySQL conference was MySQL Camp in August 2007, shame on them for not changing their marketing)

Running MySQL on SSDs and Databases: The Legal Layer

The weekly MySQL podcast, conferences, plus my regular day job here at PalominoDB have kept me quite busy in the past few months.  However, for those that missed OpenDBCamp in Sardinia, I have good news:  I will be uploading videos from OpenDBCamp in the next few weeks.  I will be highlighting what I think are the most interesting videos in this blog, and the wiki for OpenDBCamp will have all the videos when they are all uploaded.

Before we get to the video, there is another link I want to throw out there.  I happen to be a member of Ravelry, an excellently designed and maintained website for knitters, crocheters and other fiber artists.  The tech part is completely run by one person, Casey Forbes, who occasionally blogs about the tech involved.  His recent article "So you want to run MySQL on SSDs" really impressed me, and I was wondering what others might think of it.  I have added Casey's blog to PlanetMySQL, so we will see his MySQL-related entries in the future.

Now on to the video - today's highlighted OpenDBCamp video is one that is particularly interesting.  It is called "Database:  The Legal Layer" and in it a lawyer discusses what we all need to think about when we are creating and maintaining databases.  You can watch the video directly on YouTube at

MySQL at Collaborate: Bittersweet Success

As the MySQL track coordinator for Collaborate and having attended and spoken at the O'Reilly MySQL conference in previous years, I have my own feedback about this year's Collaborate.  This post goes through why I planned the track the way I did, what really happened, and my recommendations for the future.

The summary comes down to this:  The MySQL track was extremely successful, considering the goal was to educate people.  The low audience turnout caused speakers to feel it was not cost-effective.  My recommendation for the future is to continue to have MySQL content at Collaborate, but until signs point otherwise, make it more cost-effective for the speakers.

For those who want ALL the details, here goes!


Planning Collaborate 11 MySQL Content

In planning for MySQL at this year's Collaborate, we had little information.  The attendance at the MySQL sessions at Collaborate in 2010 was high.  There were very few sessions (I believe it was 4 or 5 sessions total throughout the week) and the ones that were in the program guide had very high attendance, about 75-100 people.  As for other data points, Kaleidoscope 2010 had one room for the MySQL track, and had about 10-20 people in each session. Kaleidoscope is a smaller conference overall, though.  The last piece of planning information I had was that IOUG did not dictate how many rooms the MySQL track could have.

So my expectations were based mostly on my assumption that there would be 100 or more folks in the existing Collaborate community who would attend MySQL sessions.  Given that, I wanted to have a few choices each time, and as best as possible tried not to schedule similar talks at the same time -- for example, I tried not to schedule two talks on performance tuning at the same time.

We had a lot of excellent speakers, so I figured with 3 or 4 choices each time, there would be good attendance as well as happy attendees getting a good amount of information.

What Really Happened

At Collaborate this year, there were about 40 attendees overall who went to MySQL sessions.  There were an additional 20 or so speakers.  This means that that the largest possible audience was 60 people.  Most of the presentations accepted were from rock-star speakers, so of course it was disheartening for those presenters (myself included) to speak to a room of 10 people.

There was no way to predict this, unfortunately.  The good news is that the feedback from attendees is that the information was good.  The bad news is that speakers (myself included) spent a lot of money and time away from our day jobs, only to teach 40 people.

The MySQL track was extremely successful, considering the goal was to educate people.  The low audience turnout means that speakers are feeling it was not cost-effective (cost being time and money) for the number of people that were reached.

So what is the future of MySQL at Collaborate?  I believe there's an audience for MySQL at Collaborate, and that through the years that audience will grow.  For next year's Collaborate I would recommend making it more cost-effective to the speakers by having 2-3 choices per session, for 1 or 2 days of the conference instead of having all the MySQL content evenly spread over the 5 conference days.

Having the content as almost a "mini conference within a conference" means that speakers do not have to spend an entire week away from their day job, and they do not have to pay for an entire week at a hotel.


You may have noticed that I have not said that the O'Reilly MySQL Conference being the same week was a drawback.  It surely was, for speakers who chose one or the other as well as for speakers who went to both.  However, just on cost-effectiveness alone, even without another conference at the same time, Collaborate was not a success from a speaker's point of view.

Case in point:  The cost for me to attend Collaborate totalled $2,150, which comes out to having spent over $50 for each attendee I reached, plus the 5 days away from my own clients.  If Collaborate were held in October, the numbers are still not cost-effective.  However, if I had only spent 2 nights, the cost would have been around $600, which is much more cost-effective ($15 per person that was reached, and only 2 days away from work).

Syndicate content
Website by Digital Loom