How I Find MySQL Fragmentation

Sarah Novotny recently posted InnoDB Tablespace Fragmentation - Find it and Fix it, so I thought I would share how I determine MySQL fragmentation.  This works for MyISAM and InnoDB tables, and will work whether or not innodb_file_per_table is in use.

The basic idea is to compare the size of the file(s) on disk with the size in the database metadata.  The DATA_LENGTH and INDEX_LENGTH fields of the TABLES table in the INFORMATION_SCHEMA database has size information calculated by the storage engine.  There are a few points to note:

  • Sometimes querying INFORMATION_SCHEMA can take a long time, and in rare cases has been reported to crash MySQL.  I have written about how you can tell how risky an INFORMATION_SCHEMA query might be.
  • MyISAM tables - In MyISAM, the metadata is exact, and the files on disk are split into data (.MYD) and index (.MYI).  Compare the size of the .MYD to DATA_LENGTH and compare the size of the .MYI file to INDEX_LENGTH.  You can calculate exactly how fragmented each table is, and decide which tables to defragment.  Sample query:

SELECT (DATA_LENGTH+INDEX_LENGTH)/1024/1024 AS sizeMb FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE='MyISAM';

  • InnoDB tables when using a centralized file for data and indexes (aka "not using innodb_file_per_table") - Because all of the data and indexes are in one file, you have to compare the sum of the size of all InnoDB tables with the size of the ibdata file(s).  However, because InnoDB has estimated metadata, you cannot use a direct comparison.  What I do is get a percentage difference between the size on disk and the size reported by the metadata, and if it's more than about 20% difference than the table is likely fragmented enough to justify defragmentation.  Sample query:

SELECT SUM(DATA_LENGTH+INDEX_LENGTH)/1024/1024 AS sizeMb FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE='InnoDB';

  • InnoDB tables that use separate files for metadata versus data and indexes (aka "using innodb_file_per_table") - As explained in the previous example, InnoDB uses estimated size for metadata, not exact numbers like MyISAM.  So again I figure out the percentage difference and if it's more than 20% then I will recommend defragmentation.  Sample query:

 

SELECT (DATA_LENGTH+INDEX_LENGTH)/1024/1024 AS sizeMb FROM INFORMATION_SCHEMA.TABLES WHERE ENGINE='InnoDB';

 

Note that every database is different.  I use the 20% as a guideline, not a hard-and-fast rule.  It really depends on how "wrong" Innodb's estimate of the size is.  

Fragmentation occurs when variable-length data is updated, and when variable-length rows are deleted.  You can probably guess which tables have the most fragmentation if you know what queries are being run.  If you want to get a better sense of how often updates and deletes are being run, I highly recommend using mk-query-digest against a sample set of binary logs.

Comments

Arun,

You can do any operation that will rebuild the table. For example, a column/index add/drop (assuming it's not an online operation).

OPTIMIZE TABLE will defragment the table.

For InnoDB you can do a "null ALTER" by doing ALTER TABLE innodb_tbl ENGINE=InnoDB;

All of these operations lock the table while it's going, and will replicate unless you SET SQL_LOG_BIN=0 for your session before you start.

Sheeri Cabral
Wed, 10/12/2011 - 16:17

and how do we defragment the table (other than dropping, recreating, and repopulating it)?

Wed, 10/12/2011 - 15:55

justlooks - What's the size after defragmentation?

Sheeri Cabral
Wed, 11/03/2010 - 09:37

I am sorry,but i can not agree with following opinionCompare the size of the .MYD to DATA_LENGTH and compare the size of the .MYI file to INDEX_LENGTH.  You can calculate exactly how fragmented each table is, and decide which tables to defragment because the size of .MYD or .MYI is the same as the DATA_LENGTH/INDEX_LENGTH which you get from information_schema.table or from query "show table status like 'yourtable' .for instance ,i have a table which index is heavily fragment. before i do defragment on it,the size of .MYI file is778334208 byte ,but i get DATA_LENGTH from information_schema.tables also is  778334208

justlooks
Wed, 10/27/2010 - 23:15

It's perhaps worth pointing out that all of those measures of supposed fragmentation are fundamentally bogus because what they are really doing is calculating free space, not fragmentation.It also misses significant fragmentation, that of the files within the filesystem. That's of particular importance with innodb_file_per_table and growing data, including on linux filesystems. Even more so when there are some table or index scans happening. Periodic table rebuilds can solve most of this problem and prevent the number of filesystem fragments growing to troublesome levels. Even if there's minimal free space it's possible to get significant performance benefits from eliminating this fragmentation, for some workloads. Periodic here means infrequent, when the data size has grown enough for the number of fragments to cause significant avoidable disk seeking.Both measures are interesting. Unused space within a tablespace or file increases the working set size and hurts caching efficiency, while filesystem fragmentation hurts when you're reading or updating the data.Those using the InnoDB plugin or MySQL 5.5 can also sometimes get significant gains from dropping and adding back non-unique indexes in InnoDB. That will cause them to be rebuilt using the fast alter table method and that produces a higher fill factor than random inserts. That reduces the working set size and improves cache efficiency.Your heuristics are OK, though. Just trying to be scrupulously correct after dealing with too many confused people. :) Well worth a try to see what the result is, then learn from that to see whether it really makes any difference for a particular application.James Day, MySQL Principal Support Engineer, Oracle UK

James Day
Mon, 10/25/2010 - 16:52

Reply

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.
Website by Digital Loom