At PalominoDB, we are not just another remote DBA; we are an integral part of your webops, DBA and BI teams.

Blog

More Videos from Open DB Camp

I have gotten to uploading more of the videos from Open DB camp in Sardinia, Italy back in May:

Henrik Ingo speaks about Xtrabackup Manager - video

Linas Virbalas speaks about "Flexible Replication: MySQL -> PostgreSQL, PostgreSQL to MySQL, PostgreSQL to PostgreSQL" - video - slideshare slides

MySQL to MongoDB replication (hackfest results) - video 

Robert Hodges of Continuent speaks about Multi-Master Replication: Problems, Solutions and Arguments - video

There are a few more videos from Open DB Camp to put up, then I start to put up the content from OSCon Data!

Liveblogging at OSCON Data: Drizzle, Virtualizing and Scaling MySQL for the Future

Brian Aker presents "Drizzle, Virtualizing and Scaling MySQL for the Future" at OSCon Data 2011

http://drizzle.org

irc.freenode.net #drizzle

http://blog.krow.net

@brianaker

2005 MySQL 5.0 released - web developers wanted tons of features that were not in the release (making replication better for instance)

2008 Sun buys MySQL

2008 MySQL 6.0 is forked to become Drizzle

2009 Oracle buys Sun

2010 Drizzle developers leave Oracle

2011 First GA release, Drizzle7

MySQL's Architecture - monolithic kernel, not very modular, lots of interdependence.

Drizzle has a microkernel, which includes a listener, parser, optimizer, executioner, storage system, logging/error reporting.

Drizzle can accept SQL and http blog streaming, and memcached and gearman can easily talk to Drizzle.

Drizzle has tried to have no "gotchas"

- If you make a new field with NOT NULL, MySQL makes new values NULL.  Drizzle does not do this.

- No hacky ALTER TABLE

- Real datetime (64 bit), including microseconds

- IPV6 (apparently this is a strong reason for people switching, to support IPV6)

- No updates that complete halfway

- Default character set is UTF-8, default collation is utf8-general (charset in latin1 by default in MySQL, collation is latin1_swedish_ci - "case insensitive" by default)

Replication

- In MySQL, replication is kind of hacky [this is my summary and opinion, but it's basically what Brian said]

- Drizzle is Google Protocol Buffer Based

- Replicates row transformations

- Integrates with RabbitMQ, Cassandra, Memcached, Gearman -- right now.

DML and MySQL binary logs analog:

- DML is stored transactionally by delta in Drizzle

- InnoDB is already logging, no need to add another log for the binary log.  So it just logs DML to the transaction log.

LibDrizzle

- supports Drizzle, MySQL, SQLite

- Asynchronous

- BSD, so Lawyer-free

What else?

- No cost authentication (pam, ldap, htaccess, ...)

- Table functions (new data dictionary, including performance and thread information).  INFORMATION_SCHEMA in Drizzle is *exactly* what's specified in the SQL standard.

- Data types - native type for UUID, boolean, all known types (except SET, because it's broken by design)

- Assertions are in Drizzle, you can ask what the type of the result of combining multiple data types will be.

- About 80 conflicts in the Drizzle parser as opposed to about 200 in the MySQL parser

Roadmap - Drizzle7+

- Replication - faster than MySQL and also allows multiple masters.

Virtualization:

Virtualizing a database gives you about a 40% performance hit.  How can costs be cut?  In MySQL 5.0 the Instance Manager was created to solve that but it hasn't really been worked on.  Drizzle has worked on virtualizing databases internally within Drizzle.

- So drizzle now has catalogs.  

- One catalog has its own set of users, its own schema with tables, etc.

- A catalog is its own sandbox; there is no syntax that allows you to connect from one catalog to another, so there's no security problems.  

- Cuts the 30/40% hit from virtualizing

- Single instance maintenance - only 1 OS and 1 database to configure, unlike VMs

    - Currently only one database configuration so there's one global config for shared memory such as innodb buffer pool, but that will change in the future.

- Still allows for I/O spread on SAN/NAS

 

In Drizzle 7.1 - Percona's xtrabackup supports Drizzle, and ships with drizzle.  xtrabackup supports full and partial backups, no locking, single solution for point-in-time recovery in a single solution.  Because transaction log is stored in database, replication is automatically consistent with the database.  Currently does not do incremental backups with the transaction logs but that's in the future.

DBQP:

- consolidates standard testing tasks, server/test management, reporting, REGRESSION TESTING

- extended architecture allows for complex testing scenarios

- pluggable - supports new testing tools

- randgen, sql-bench, crashme, sysbench, standard drizzle-test-run suite

- Keeping tools and testing configurations in-tree facilitates testing for everyone

- supported by SkySQL

 

Dynamic SQL/execute()

- New UTF-8 parser

- Being extended to allow for plugging in application servers.

 

>120 developers since day 1

avg 26-36 per month that commit

 

Bugs database - http://bugs.launchpad.net/drizzle

Liveblogging at OSCON Data: MongoDB Schema Design

Dwight Merriman gives "MongoDB Schema Design" at OSCon Data 2011.

@dmerr

 

RDBMS / MongoDB

relational / document-oriented

database / database

table / collection

row / JSON (BSON) document

index / index

column / field (dynamic/not predeclared)

SQL / Mongo query language (JSON)

Join / Embedding & Linking

Primary Key / _id field

 

Schema design is coupled with what you want to do:

- Dynamic queries

- Secondary indexes

- Atomic updates

- Map Reduce

Considerations:

- no joins

- atomic operations are at the single document level only

- types of queries/updates to do

- sharding

- speed

 

This is the commandline mongo interface but all this can be done in any (modern) programming language.

post = {author: "Herge"

date: new Date(),

text: "Destination Moon",

tags: ["comic", "adventure"]}

> db.posts.insert(post)

"posts" is the collection name.  Documents are analogous to rows but can be more complex.  Documents for one collection are grouped together.

> db.posts.find()

{ _id: ObjectId("4c4ba5c0672...."),

 

author: "Herge"

date: new Date(),

text: "Destination Moon",

tags: ["comic", "adventure"]}

_id must exist and must be unique -- if you don't create an _id, one will be made for you, 12 bytes BSON, shorter than a normal UUID but that's OK because you don't need it to be unique globally, just on this db cluster.

Secondary index, on "author":

> db.posts.ensureIndex({author: 1}) -- "1" means ascending, -1 is descending

> db.posts.find({author: 'Herge'}).explain() -- shows you the explain plan

 

Multi-key indexes

//build an index on the "tags" array

> db.posts.ensureIndex({tags: 1})

Arrays are exploded and every element of the array will be indexed, and added separately to the B-tree data structure of the index.

> db.posts.find){tags: 'comic'})

MongoDB assumes, when you query an array, that you mean you're looking for an array item that matches.

 

Query operators

Conditional operators:

 

$ne, $in, $nin, $mod, $all, $size, $exists, $type, $lt, $lte, $gt, $gte

 

 

 

 

 

 

 

 

 

 

Update operators:

$set, $inc, $push, $pop, $pull, $pushAll, $pullAll

Extending the Schema:

new_comment = {Author: "Kyle",

date: new Date(),

text: "great book!"

votes: 5}

db.posts.update( text: "Destination Moon" }, -- this is the WHERE filter

{ '$push': {comments: new_comment}, -- do this

'$inc': {comments_count: 1}}) -- and do this

If you push the comments array without it being there, it will create it without a problem.

 

> db.posts.ensureIndex({"comments.author":1 })

> db.posts.find({comments.author:"Kyle"})

> db.posts.find({comments.text:"good book"})

The 'dot' operator

 

Find all posts with >50 comments,
> db.posts.findIndex({comments.votes: {$gt: 50}})
Not as robust as all the operators in SQL, but it's pretty good, and more concise than SQL.  Over time more expressions will be added.
Find all posts with >50 comments, order by author ascending
> db.posts.findIndex({comments.votes: {$gt: 50}}).sort(author: 1)
No functional indexes (indexes on functions of fields)
If you add an index to a non-existent field, it returns NULL (which is necessary because not all documents have the same fields).
From a schema design standpoint, the point of MongoDB is to make the documents rich.  He puts up an example of a sales order, with many line items, an address field that has name, street, zip, cc field that has number, exp date.
There is model inheritance, for instance if you have
> db.shapes.find()
{_id: 1, type "circle", area: 3.14, radius: 1}
{_id: 2, type "square", area: 4, d: 2}
{_id: 3, type "rect", area: 10, length: 5, width: 2}
All shapes have area, but the other dimensions are different based on the shape.
> db.shapes.find ({radius: {$gt: 0}})
-- automatically finds only circles.
Note that this avoids the need to join for 1:many and many:many relationships, as in relational model.
That was embedding, now let's talk about linking.
- done client-side

 

So for instance, a one to many relationship might look like this -- in addition to the collection for posts, a collection for authors with author info:
// collection authors
{ _id: "Herge"
email: "h@foo.com",
karma: 14.142 
}
> var p = db.posts.findOne()
> var author = db.authors.findOne({_id:p.author})
> print(author.email)
If it's a "contains" relationship you want to embed
If you need more flexibility than that, link
Rich documents are usually easy to query

 

Rich documents are great for performance

 

Rich documents give more atomicity capability
{
account: "abc",
debits: 21,
credits: 11
}
> db.stuff.update({account:'abc'},
{$inc:{debits:21},{$dec:{credits:11}})
Caching is based on 4k pages, so if you have very small documents, that can be a problem if you are pulling from many collections.
Trees in MongoDB:
{ comments: [
   { author: "Kyle", text: "...",
     replies: [
            { author: "Fred", text: "...",
                replies: []}
       ]}
]}
Mongo doesn't search recursively so while this is great for display, not great for search.
> t = db.mytree;
> t.find()
{ "_id" : "a" }
{ "_id" : "b", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
{ "_id" : "f", "ancestors" : [ "a", "e" ], "parent" : "e" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }
> t.ensureIndex( { ancestors : 1 } )
> // find all descendents of b:
> t.find( { ancestors : 'b' })
{ "_id" : "c", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "d", "ancestors" : [ "a", "b" ], "parent" : "b" }
{ "_id" : "g", "ancestors" : [ "a", "b", "d" ], "parent" : "d" }
> // get all ancestors of f:
> anc = db.mytree.findOne({_id:'f'}).ancestors
[ "a", "e" ]
> db.mytree.find( { _id : { $in : anc } } )
{ "_id" : "a" }
{ "_id" : "e", "ancestors" : [ "a" ], "parent" : "a" }
Limit of document size is 16 Mb per document.  Was 4 Mb, will keep increasing probably based on Moore's Law.  This is arbitrary just as a safety measure.
BSON is not compressed due to wanting to scan quickly.
Can do Queueing too, use the findAndModify method to find the highest priority job and mark as in-progress - see http://www.mongodb.org/display/DOCS/findAndModify+Command
The operation is fast, however the entire document is locked -- not ideal, but concurrency is getting better and better in MongoDB.
MongoDB tries to do update in place, if you are adding to the document such that it does not fit in the allocation unit, it has to be deleted and reinserted.  This is expensive, hence the allocation unit - an adaptive padding factor based on the collection unit.

 

 

 

 

 

 

 

Video: Henrik Ingo's "Buildbot & EC2: how to qa and package your db product"

And another video from OpenDBCamp is online....Today it is from Henrik Ingo - "Buildbot & EC2: how to qa and package your db product".

I cannot seem to find the slides Henrik used, but the video is now online at http://www.youtube.com/watch?v=07tsdSvR5C0.

By the way, there are plans to video *all* of the sessions at OSCon Data next week (MySQL and otherwise), which is what made me look in my "video to do" folder earlier this week and realize that I had not yet put up all the OpenDBCamp videos yet!

Video: High Performance Search with BlackRay Data Engine

I realized yesterday that I never did finish putting up the videos from this year's Open Database Camp back in May, so I'm working on finishing that in the next few weeks.

Today I put up High Performance Search with BlackRay Data Engine - Felix Schupp.  The slides are on the web at http://www.slideshare.net/fschupp/blackray-the-open-source-data-engine-2011683, and the video is on YouTube at http://www.youtube.com/watch?v=oCB3ZXfc8Rs YouTube video

Why We Need the Bugs Database to Stay Public

There are been many non-quantifiable (but valid) reasons why the bugs database at http://bugs.mysql.com should remain open and have as many bugs open to the public as possible.  However, while researching an article recently I uncovered a simple, concrete reason why the bugs database needs to stay as open as possible:

So we know when to upgrade.  Really!  I was looking at the CHANGELOG pages for MySQL 5.5 and was trying to figure out which changes were pushed into MySQL 5.5 along with being pushed into MySQL 5.1 and which were unique to MySQL 5.5.  

While it is fairly simple to figure out some of that -- for instance, all of the patches regarding semisynchronous replication are unique to MySQL 5.5 - it was not clear on most of the fixes, particularly the bug patches.  Luckily we have the bugs database so we can double-check this.

Let's say you work for an organization that is contemplating whether or not to upgrade from MySQL 5.1.50 to MySQL 5.5.  Looking at the change logs for MySQL 5.5.0, if you can manage to read the entirety of the page, you'll see a change that notes:

Appending values to an ENUM or SET definition is a metadata change for which ALTER TABLE need not rebuild the table, but it was being rebuilt anyway. (Bug #45567)

Well, that sounds like a great bugfix - making an ALTER TABLE operation that's supposed to be online, actually an online change.  However, if you look at the bug itself, you will note that it was "Pushed into 5.1.40" also.  Thus, already being on MySQL 5.1.50, you already have that change.

Without the bugs database it would be virtually impossible to know this -- you'd have to cross-reference the changelogs for all the MySQL 5.1 minor versions, or at least the ones that were being worked on in that timeframe....which might narrow it down to 2 or 3 versions, but it's still excessive.

And of course as fewer and fewer of the bugs are public, we are left trying to figure out what exactly the description on the CHANGELOG page is, without having the ability to search the bugs database.

So please, keep the bugs database as open as possible.

Not just so we don't submit duplicates, wasting Oracle engineers' (and our own) time.  

Not just so we can submit patches for fixes.

Not just so we can see that a bug exists or is being worked on.

Not just so we can know what a bugfix really entails.

But also so we can know what bugs actually affect us, and what will actually be fixed on an upgrade from one version to the next.

Liveblogging: Cassandra Internals

Cassandra Internals by Gary Dusbabek of Rackspace
Questions?
What's the best way to access data if you're running a program in the same JVM as Cassandra? -- will talk about it during StorageProxy section of the talk
Performance characteristics of using MMAP vs. not using it? - won't cover it.
When does repair happen?  will talk about it during repair part of the talk
How do Snitch and replication strategy work together? -- will discuss though there is no slide on it.
Ring services - services that go throughout the ring.  These are in a class called StorageService.
Storage services - things that happen locally.  In a class called StorageProxy.
The cassandra executable in /bin executes cassandra.in.sh, which does:
- sets $CLASSPATH
- looks for the .jar files
- sets $CASSANDRA_CONF (mandatory, where yaml file lives)

 

- sets $CASSANDRA_HOME (not mandatory)
then it looks for another file [didn't get what it was] which:
- determines heap size
- sets max heap size by default to 1/2 available memory
- sets the size for the young generation for Java GC
- sets "a whole bunch of other -X options for Java"
... then it goes to the main() class, org.Apache.Cassandra.Thrift.CassandraDaemon, which:
extends AbstractCassandraDaemon, the guts of the startup sequence.  Has a method called setup(), raises config file from a Database Descriptor class.  
"Database Descriptor is an awful class."
- loads yaml file, reads into a config object, gets all the settings.
- then calls DatabaseDescriptor.loadSchemas() and loads the schema based on the last versionID, and sets them up to store them in the system column families (in the system datadir, schema column family).
- scrubs the data directories, takes out the trash (e.g. leftovers from compaction, bits and pieces from other SS tables)
- initializes the storage (keyspaces + CFs)
- Commit log recovery: CommitLog.recover() (row mutations)
- StorageService.initServer() and StorageService.joinTokenRing -- this is where the magic of joining the ring happens
  -- starts gossip
  -- starts MessagingService
  -- Negotiates bootstrap
  -- knowledge of ring topology is in StorageService.tokenMetadata_ (btw underscore at end of a member variable means it's old facebook stuff, b/c that's their naming convention)
  -- partitioner is also here.
Configuration
- in DatabaseDescriptor, really a side effect of AbstractCassandraDaemon.setup
- reads config settings from yaml
- defines system tables
- changes regularly
It uses a static initializer, so we might end up making a change that happens when we're not ready for it.
MessagingService
- Verb handlers live here (initialized from StorageService)
 -- main event handlers, haven't changed much
- Socket listener
 -- 2 threads per ring node
- Message gateway
 -- MessagingService.sendRequestResponse()
 -- MessagingService.sendOneWay()
 -- MessagingService.receive() -- when another node contacts you, this is the method that's used to pass the message to a verb handler
- Messages are versioned starting in 0.8
 -- with IncomingTCPConnection
StageManager - fancy java ThreadPoolExecutor
- SEDA design: http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf
Adding the API Methods
- open up cassandra.thrift file in the interface directory, this is where you describe methods and new data structures
- regenerate files with ant gen-thrift-java gen-thrift-py
- implement stubs: o.a.c.thrift.CassandraServer
StorageProxy - where local reads and writes happen.
- Called from o.a.c.thrift.CassandraServer
- write path changed in new version b/c of counters
 -- notion of WritePerformer
- eventually to Table and others
- for reads, there's a local read path and remote read path
 -- Socket->CassandraServer.  Looks at permissions, request validation, and marshalling.  
ReadCommands created in CS.multigetSiceinternal, passed to StorageProxy -- 1 per key.
StorageProxy iterates over the ReadCommands, then runs StorageProxy.read(), .fetchRows(), determines endpoints.
Locally, StorageProxy:
- READ stage executes a LocalReadRunnable
- True read vs. digest
- Table, ColumnFamilyStore
Remotely, StorageProxy:
- serializes read command
- Response handler
- Send to remote nodes
ReadRepair happens in StorageProxy.fetchRows()
Writing -- follows similar pattern to reads -- there is a local path and remote path.
- The marshalling turns into row mutations in CS.doInsert()
- StorageProxy.sendToHintedEndpoints
- RowMutation - one key per row (several CFs), so it calls ColumnFamilyStores.apply() to update the memtables.
RowMutation is serialized into a Message.
Then he goes into the challenges of working with the code, which I won't reproduce here.

Liveblogging: Migration from 100s to 100s of Millions

Migration From 100s to 100s of Millions – Messaging to Mobile Devices with Cassandra at Urban Airship Urban Airship - provides hosting for mobile services that devs should not build themselves (e.g. push notifications, content delivery services, etc). Provide a unified API for all these services across all platforms (Andriod, iOS, RIMM, etc). Starting to have SLAs for throughput and latency. Transactional intake system at Urban Airship: API - Apache/Python/django+piston+pycassa Device Negotiation layer - Java NIO+Hector Message Delivery layer = Python, Java NIO + Hector Device Data checkins = java HTTPS endpoint Persistence - sharded postgreSQL, Cassandra 0.7, "diminishing footprint" of MongoDB 1.7 What do they use? Cassandra HBase redis (analytics internal measurements) MongoDB (phasing out) They don't use Riak. Converging on Cassandra + PostgreSQL for transactions, HBase for data warehousing. They started with PostgreSQL on EC2, but had so many writes that after 6 months they couldn't scale, so they went to MongoDB. MongoDB had: heavy disk I/O problems, non-sophisticated locking caused locking, deadlocking and replication slave lag that was just not working out for them. So they moved to Cassandra. Why? - Well-suited to data model - simple DAG's - lots of uuids and hashes which partition well - retrievals don't need ordering beyond row keys or time-series data (e.g. doesn't matter what order 10 million devices are retrieved, just need to retrieve them!) - Rolling minor version upgrades are easy in Cassandra, no downtime. - Column TTLs were huge for them (and resulting expiration) - Particularly well-suited to working around EC2 availability problems - They needed to partition across multiple availability zones, they didn't trust fault containment within one zone. - Read repair and handoff generally did the right thing when a node would flap (Ubuntu #708920) - No single point-of-failure - Ability to alter consistency levels (CL) on a per-operation basis - some things aren't important to be consistent right away, others are very important. Cassandra tips: - Know your data model - creating indexes after the fact is a PITA - design around wide rows (but be careful of I/O, Thrift, Count problems) - Favor JSON over packed binaries if possible (unless you're Twitter) - Be careful using Thrift in the stack - having other services that use Thrift that have to talk to Cassandra has some painful versioning limitations. - Don't fear the StorageProxy. - Looking at the Cassandra source code and getting your hands dirty with the Java code is a MUST. - Assume the client will fail (difference between read timeout and connection refused) - When maintaining your own indexes, try and clean up after failure. (i.e. have a good rollback strategy) - Be ready to clean up inconsistencies anyway - Verify client library assumptions and exception handling, make sure that you know what's going on when the client communicates that it couldn't do a write. Understand what the client is doing so you can figure out whether to retry now or later or what. - Embedding Cassandra for testing really helped Cassandra in EC2: - Ensure Dynamic Snitch is enabled (also make sure you check your config files during upgrades...they had Dynamic Snitch off in 0.6 due to bugs, when they upgraded to 0.7 they didn't turn it on) - Disk I/O - avoid EBS except for snapshot backups ... or use S3. Stripe ephemerals, not EBS volumes, because Cassandra is network I/O heavy (b/c EBS is a networked disk). - Avoid smaller instances all together -- i.e. avoid virtualization if you can - Don't assume that traversing a close-proximity availability zone is more expensive than in the same availability zone -- it is sometimes, often isn't. (No comment on different regions, haven't tested yet) - Balance RAM costs vs. the costs of additional hosts. Spend time with the GC logs. Java best practices: - ALL Java services, including Cassandra, are managed via the same set of scripts. For them, in most cases, operators don't treat cassandra different from HBase, one mechanism to take a thread or heap dump, all logging is consistent for GC, application, stdx for HBase and Cassandra, even init scripts use the same scripts that the operators do. - Bare metal will rock your world - configure +UseLargePages will be good to (on bare metal) - Get familiar with GC logs (-XX:+PrintGCDetails), understand what degenerate CMS collection looks like, and what promotion failures look like. Urban Airship settled at -XX:CMSInitiatingOccupancyFraction=60, lowered from the default of 75, to do CMS collection before there's a problem, to avoid promotion failures. Operations: - Understand when to compact - Understand upgrade implications foro data files - Watch hinted handoff closely - Monitor JMX religiously Looking forward: - Cassandra is a great hammer, but not everything is a nail - Co-processors would be awesome (hint hint!) - They still spend too much time worrying about GC - Glad to see the ecosystem around the product evolving, CQL, Pig, Brisk

Liveblogging: Indexing in Cassandra by Ed Anuff

Indexing in Cassandra

First, a brief history:

Cassandra 0.6

- No built-in secondary

- all indexes were custom-built like using supercolumn

Cassandra 0.7

- new users flocked to the built-in secondary indexes

pros - easy to use out of the box

cons - not the same as SQL indexes but they look similar

- reinforce data modeling that plays *against* cassandra's strengths

Present Day

- new users can get started with Cassandra w/out understanding internals, using CQL

- Veteran users are using advanced techniques like composites that aren't really documented anywhere.

- New user panic mode when they try to use the next level and find themselves in the deep end.

 

Quick review of Indexing

2 ways of finding rows - primary index and alternate indexes

primary index (row keys):

- sometimes it's meaningful (natural key)

- usually not, like a uuid

 

Get vs. find:

using row key is best to retrieve info if you've got precise and immutable 1:1 mapping

if you plan to iterate over keys, you're probably doing something wrong. (that's finding, not getting)

So search shouldn't really use primary keys, just 'get'.

 

 

alternate indexes (everything else)

Native Secondary indexes:
- easy to use, looks like SQL
- every index is stored as its own "hidden" column family (CF)
- nodes index the rows they store
- when you issue a query it gets sent to ALL nodes (no partition pushdown)
- Currently does equality ops, range get performed by memory coordinator node.
This behavior contributes to these limitations:
- Not recommended for high cardinality values (timestamps, birthdays, keywords, etc)
- Requires AT LEAST one equality comparison in a query, not efficient for less than, greater than or range queries
- Unsorted - results are in token order, not query value order
- Limited to search on data types Cassandra *natively* understands.
wide rows as lookup and grouping tables
"Why would a row need 2B columns?
- It's the basis of all indexing, organizing and relationships in Cassandra?
- if your data model has no rows with >100 columns, you're probably doing it wrong (you're thinking in relational terms!)
Inherently, wide rows work as a simple index -- 
indexes = {
"User_Keys_by_last_name": {
 "aaaa"
"aaaab"
etc...
CF as indexes
-cf column ops very fast
-column slices can be retrieved by range, are always sorted, can be reversed, etc.
-if a target key a TimeUUID you get both grouping AND sorting by timestamp.  Good for inboxes, feeds, logs, etc
- This is the best option when you need to combine groups, sort and search, such as a friends list, inbox, etc.
But...what about 2 people with the same last name?  (non-unique keys)
In the docs, the first answer you might find is SuperColumns - lets you have your col name have >1 col value.  So, 2 row keys for people with 1 last name
Use SuperColumns with caution.
- Not officially deprecated, but they're not highly recommended either.
- sorts only on the supercolumn, not the subcolumn
- some performance issues
- Cannot do more nesting, can only have 1 level of subcolumns
- Many projects have moved away from supercolumns b/c of these limitations.
So, let's revisit regular CF's -- what happens with >1 person with the same last name?  you can't have 2 cols with the same column name.  You can do a composite column name:
"User_Keys_by_last_name": {
("alden", 1): "e5d",
("adams", 1): "et3",
("anderson", 1): "e5f",
("anderson", 2): "e71..",
("doe", 1): "a4f",
("franks", 1): "f4e",
Composite column names
Comparator = "CompositeType" or "DynamicCompositeType"
 - you don't lose the sort capability - sorts by component values using each component type's sort order
2 types of composites, static and dynamic
column_families:
name: My_Composite_Index_CF
compare_with: CompositeType(UTF8Type, UUIDType)
-- note in static composite types, fixed # and order of columns in the definition
name: My_Dynamic_Composite_Index_CF
compare_with: DynamicCompositeType(s=>UTF8Type, u=>UUIDType)
-- Any # and order of types at runtime, the definitions are just for convenience and smaller serialized component names.
The main difference is whether you need to create one CF per index s, or one CF for all indexes with one row per index
How does this work?
Queries are easy - just regular column slice ops
Updates are harder - need to remove old value and insert the new value -- this is why they recommend starting with the built-in native secondary indexes. You have to know how to remove old values then insert new values, which involves a read before write.
Example - Users by Location
Use 3 CFs, not 2 for safe concurrent access
First 2 CF's are natural:
Users
Indexes
We also need a 3rd:
User_Index_Entries
Users = {
"username": "..."
"location": <location>
}
Indexes = {
Users_By_Location" : {
  {<location>, <user_key>, <ts>} : ..., ...: ..., 
  }
}
Users_Index_Entries = {
<user_key>: {
 {"location", <ts 1>}: <location 1>,
 {"location", <ts 2>}: <location 2>,
 {"location", <ts N>}: <location N>,
Allows you to read the previous index value from Users_Index_Entries CF and delete the previous one. 
Read from Users_I_E WHERE KEY=<user_key>;
DELETE FROM Users_Index_Entries
DELETE FROM Users_By_Location
UPDATE Users_Index_Entries
UPDATE Users_By_Location
B/c there's a timestamp, doing it >1 time has no consequence, so eventually consistent.
What if something goes wrong?  Repeat batch operation until it completes.
False positive?  possible, so if it's a problem, filter on the reads.
This approach is VERY common -- with some variations.  So use this *idea*, but not necessary to be an exact copy of this example.
At least now, composite indexing is now standard. 
Can do derived indexes -- create a "last_name, first_name" index from a "fullname" column.    Can also unroll a JSON object to construct deep indexes of serialized JSON structures.
- Include additional denormalized values in the index for faster lookups
- use composites for column values, too -- not just column names.
custom secondary indexes
Note: no official alternate index "way".  Everything talked about here is using an official Cassandra feature/property.
How can I learn more?
Sample using Hector:
JPA implementation for this using Hector:

 

Jira entry on this:

http://issues.apache.org/jira/browse/CASSANDRA-2231

Turning off the Query Cache works in Percona Server 5.1 , before MySQL Server 5.5

In a previous blog post I wrote that is not possible to completely turn off the Query Cache in MySQL before 5.5 .

Although, it is possible to completely turn off the Query Cache with Percona Server 5.1 .

MySQL bug #38551 reports that the change is incompatible with MySQL 5.1 . In this case the incompatibily is only related to how the server behaves: if Query Cache is disabled at startup it  cannot be enable at runtime. 

Thanks to Kenny Gryp from Percona for pointing this out and for the clarifications provided.

Website by Digital Loom