Posts Tagged ‘storage engine’

Liveblogging: Architecture of Maria By Monty Widenius

By Sheeri Cabral April 16th, 2008 at 4:58 pm
Posted in Group Blog PostsMySQL
Tags:

Architecture of Maria: A New Storage Engine with a Transactional Design

Goals of Maria:


  • ACID compliant
  • MVCC, transactional
  • default non-transactional and default transactional storage engine for MySQL
  • MyISAM replacement, including temporary table use
  • Storage engine good for data warehousing.
  • Allow any length transactions to take place
  • all indexes should have equal speed (clustered indexes are not in the plan)
  • log shipping — incremental backups just by copying the logs
  • used as a standalone library
  • fast count(*)
  • allow copying of Maria tables between different Maria servers
  • Better blob handling (than MyISAM) — no memory copying, or extra memory used for blobs on INSERT/UPDATE
  • BLOBs are allocated in big sequential blocks means that there is less fragmentation over time.
  • BLOBs are stored so Maria can be easily extended to have access to any part of a BLOB with a single fetch in the future

Maria is actually faster than MyISAM.
Will support READ COMMITTED and REPEATABLE READ (was a specific question).

“You should be able to upgrade Maria and have it just work, without ever having to backup and restore as part of the upgrade.”

Why do we need 2 storage engines, Maria and Falcon?
More storage engines = more scenarios to be able to handle. Falcon good for lots of memory, and shorter queries, Maria is to take care of long transactions and data warehousing side, and also to be an overall “good” engine for most purposes (at least as good as InnoDB).

Project plan — MySQL 5.1 and higher. Maria strives to be bug-free.

Maria 1.0 = “crash safe”, released in Jan 2008. Cacheable row format.
Maria 1.5 = “concurrent insert/select”, Apr/May 2008. Will be merged into the MySQL 6.0 release. Non-transactional.
Supports:


  • External/internal row CHECK/REPAIR/compressing
  • Different row formats — fixed size, dynamic length, compressed, page
  • Fast checksum of a table
  • maria_check

MySQL 6.0 (includes Maria)
Maria 2.0 = “Transactional and ACID” alpha, Q4 2008
Maria 3.0 = “High concurrency + Online backup” alpha, Q1 2009; GA, Q2 2009

Advantages of Maria compared to MyISAM

  • Data, index and statistics are crash safe. On crash, everything will rollback to the state of the start of the statement or last LOCK TABLES commands.
  • Can replay everything from the log. Still need to backup the .frm files though.
  • LOAD INDEX can skip index blocks for unwanted indexes
  • most parts of Maria have unit tests.
  • Supports all MyISAM formats and a new PAGE format, where data is stored in pages — this is:

    • Only crashsafe, transactional row format for Maria
    • Versioned, so you can have concurrent insert/select
    • Cached by page cache so will give noticeable speed improvement on systems with poor data caching

  • MyISAM is non-transactional only, Maria can support transactional (crash-safe) and non-transactional formats
  • Maria uses big log files by default — 1G
  • Log control file = maria_log_control
  • Log files = maria_log.?????
  • If a log file is already in use by a transaction, create a new one, otherwise, reuse it. SHOW LOGS will show which logfiles exist, which ones are in use, and you can PURGE the logs.
  • Maria uses 8K pages by default (MyISAM uses 1K pages by default. This should be faster on static size indexes but slower on variable length keys, until Maria 2.0 where there will be a directory for index pages.

Before Apr/May release of Maria 1.5

  • Statistics and indexes versioned (data is already versioned)
  • FULLTEXT (boolean mode) and R-TREE indexes need to be made crash-safe
  • Parallel recovery

Disadvantages, not likely to be fixed

  • No external locking
  • One page size for both index and data
  • only one page cache
  • index number and checksums require an extra 5 bytes per page
  • doesn’t support RAID
  • Minimum data file size for BLOCK format is 16K, with 8K pages
  • Storage of rows < 25 bytes are not efficient for the PAGE format

XDB indexes — group records, and get the min and max for each group (groups are not sorted), and save that in a separate block. This means that stats like min and max are very fast. This is something that will help data warehousing queries.

Resources:

Best quote from Monty: “You don’t steal ideas in open source. You respect people.”

What Applications Are Good For MySQL Cluster?

By Sheeri Cabral February 13th, 2008 at 9:41 am
Posted in Group Blog PostsMySQL
Tags:

Someone asked me what applications were good/bad for MySQL Cluster. As I’ve now actually had experience with a Cluster setup and a real-life application of it, and dug through the manual, I present a few characteristics of applications that will work with Cluster, and why they are so (so that if you have an application that meets some of the characteristics but not all of them, you can decide whether it’s worth it to use Cluster or not).

Firstly, I’ll state this — there’s actually a very limited application to MySQL cluster. I haven’t assessed the disk-based cluster, only the memory-based one, so I don’t know what really changes with disk-based. But after you see this list, you certainly will want to re-think your use of disk-based cluster if a lot of the inner workings don’t change.

The factors are listed below, but the “ruler” I keep in my mind is the fact that MySQL Cluster was developed for telecom applications. The basic characteristic is a lot of writes, small data in amounts that can fit into memory, and the data being transient in nature. Something like a “session” table is a great application.

So here are the characteristics of a good application for MySQL Cluster, with explanations of why: (more…)