THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

MySQL Conference 2010 – last day

The post is a little late since we came back to the hotel in the wee hours of the morning and I had to get up at 5:00am this morning.

Anyway, The morning’s keynotes again were quite good especially the one from Ubunto. I attend another packed Facebook session and one from Ronald Bradford. It was the first time I head him speak and he is very good.

I didn’t attend many of the afternoon events because of customer commitments but the remain group at the hotel had a passionate discussion on Oracle role in the MySQL community so its far from over and Oracle has one year to fix it. See you all next year

MySQL 2010 Conference – day 2

We started with the morning keynotes again today and I was a little surprised at the snipping going on while people are promoting their products it felt very counterproductive and will end harming the MySQL eco system. The Community Keynote on the other hand looked into the future for itself under the Oracle umbrella. Its still remains to be seen what form the conference will have next year.

The first session that I attended today were around MySQL partitioning in the beta releases and beyond. They were saying the right things but it remains to be seen when some of these features will be available. I would hope that the engineers responsible for MySQL partitioning have a little talk with their Oracle counterparts which have been at it for quite a while. They should sync the features up which would make it easier for us multi-disciplinary DBAs

The next set of sessions were on IO bottlenecks, Mysql data warehousing and finally Danil’s Security session. I pick up something that I didn’t know in all of them making them all worthwhile.

I spent a fair bit of time with the column store vendors to talk about my latest data warehouse project and hopefully I’ll get the opportunity to kick the tires on one of them if time permits.

Finally there was the Ignite session that just finished a few minutes ago which was quite entertaining with lots of nice rapid fire topics.

That’s all for today…Last day tomorrow

MySQL Conference 2010 – Day 1

Well the first day has come and gone and I really enjoyed my first day as a newbie. The keynote from Oracle was well received, they touched on the new Beta version of MySQL and the new mysql enterprise which to a trained oracle eye is looking more and more like Grid Control. The end result is providing more instrumentation to help the DBA but I am a little disappointed that a lot of that instrumentation is not actually in the database itself which forces you to buy the product.

The O’Reilly Keynote provided some interesting glimpse into the future.

There is a strong contingent from Facebook and those sessions are just packed with everybody wanting to know how they scale their infrastructure.

The spider storage engine presentation tweaked by interested with the promise of vertical partitioning…I’ll have to try that when I come back.

My presentation “Better Database Debugging for Shorter Downtimes” went fairly well for a first timer:). There was about 70-80 people attending and most of them stay till the end.

Book Review of Guy Harrison’s Oracle Performance Survival Guide

This is the first time I have read one of Guy’s books and I was very impressed at the breadth of topics that are addressed and especially the approach that is followed starting from the application design rather than trying to find the elusive fast=true setting that resolves every problem.

We have all looked like heroes at one time or another by tweaking a parameter that was not set adequately or adding a missing PK to a table and made an expensive full scan go away but what happens when there is not obvious outlier. If you have no plan of attack, you might get lucky or waste lots of time trying to fix a symptom and not the real issue well this is where Guy’s approach takes the guess work out of the process. You might not always be able to change a 3rd parties ERP design, although we have all seen some bad ones, but you will be embarking on your quest equipped with a logical methodology that will most likely be successful.

As you go through each chapter, you start with the optimization goal of this layer along with the concepts and tools to get the job done then you peal off that layer and move to the next one until you get the the physical hardware. Obviously each section is not as detailed as other books related to database performance as each chapter could be a book on its own.

I find it a great reference for database professionals that are not as experienced with performance tuning and those of us who have been doing it for a while as I picked up a few items that I already added to my arsenal.

The technical content in this book can be found elsewhere but the strength is that you have it all summarized in one place in the order in which they should be addressed.

It’s a good plan of attack to follow when it’s time to execute a thorough performance review.

Great job, Guy!

Simple MySQL Auditing

Simple auditing, i.e., knowing what changed recently, can save you tons of time while troubleshooting.

I know that, in the ideal world:

  • Everything is supposed to be done through configuration management.
  • Everything is documented and all changes are tracked through a VCS.
  • Every DDL or set global is trapped via MySQL Proxy and logged.

But there are always ways to bypass the gatekeepers. Changes can go in unnoticed. An hour or so later, your database performance suddenly changes for the worse, and you get that phone call.

First you check if anything caused an actual error. You look around at a few log files and nothing shows up. The next thing you ask yourself is, did someone change anything in the last little while. Of course, everybody says no. After a few hours of digging, comparing schemas, diff-ing old and current config files, you actually find what has changed, put it back the way it was and everything is back to normal. You ask the question again, did anybody change the global variable from X to Y, and someone finally admits it. But they thought that it was not relevant since they did that change on the weekend and the system performance didn’t go down the toilet until Monday morning around 9:00am. Sound familiar?

We have all gone through this many times, and it doesn’t matter what process you have in place — something always slips through the cracks.

I came up with some simple stored procedure that will compare two data sets and keep track of the changes historically. It’s loosely based on slowly-changing dimension type 2 in the data warehouse world (google Ralph Kimball if you want to get all the gory details). This method tracks only changes, so you should be able to keep historical rows forever, unless you are constantly dropping and creating whatever you are tracking.

Say you want to keep track of my.cnf or global variables changes. This one of the simplest tables to track, the information_schema — it has only two columns. Others, such as tables or routines, would make the SQL to compare a little more involved.

First you need a table to track this with. I called mine historical_global_variable. It has the two columns from the original table plus three additional ones:

Read the rest of this entry . . .

Date Dimension vs. Function-Based Indexes: Is the Date Dimension Still Relevant?

The purpose of this post is to verify if a date dimension is better in regards to performance and functionality than a series of function-based indexes on a date column in the fact table.

The following is the date dimension and fact table used for the testing:

Date Dimension

create table date_dimension(
date_id number, --1..x days
sql_date date, -- trunc(date)
day_of_week varchar2(30), -- (long version)
day_of_week_short varchar2(3), -- (short version)
day_number_in_week number, -- (1..7)
day_number_in_month number, --(1..31)
day_number_in_year number, --(1..366)
workday varchar2(10), --(either workday or holiday)
holiday varchar2(50),
  --(specify text e.g. christmas, easter...)
weekday varchar2(10), -- (specify weekday or weekend)
last_day_in_month number, -- (28,29,30 or 31)
week_number_in_year number , --1..52
week_number_in_month number, --1,5
month varchar2(10) ,--(January...)
month_short varchar2(3), -- (Jan...)
month_number_in_year number, -- (1..12)
quarter_in_year number, -- (1,2,3,4)
year number, -- (4 digits)
chinesezodiac varchar2(50),
zodiac varchar2(50) ,
event varchar2(100) -- (election day,Hurricane Hugo) text field
) ;

The are rows for every day from the year 1850 to 2050 (way after I retire). There is a primary key on date_id, a unique key on sql_date, and bitmap indexes on all other columns. The table is compressed and its stats computed.

Segment sizes:

WEEK_NUMBER_IN_YEAR_IDX	.125
MONTH_IDX	.0625
MONTH_SHORT_IDX	.0625
MONTH_NUMBER_IN_YEAR_IDX	.0625
QUARTER_IN_YEAR_IDX	.0625
YEAR_IDX	.0625
EVENT_IDX	.0625
WEEK_NUMBER_IN_MONTH_IDX	.125
CHINESEZODIAC_IDX	.0625
ZODIAC_IDX	.0625
WORKDAY_IDX	.0625
HOLIDAY_IDX	.0625
WEEKDAY_IDX	.0625
LAST_DAY_IN_MONTH_IDX	.0625
DATE_DIMENSION_SQL_DATE	2
DATE_DIMENSION_PK	2
DAY_OF_WEEK_IDX	.1875
DAY_OF_WEEK_SHORT_IDX	.1875
DAY_NUMBER_IN_WEEK_IDX	.1875
DAY_NUMBER_IN_MONTH_IDX	.1875
DAY_NUMBER_IN_YEAR_IDX	.3125

Fact Table

create table fact (id number, tx_date_dm_id number, tx_date date, amount number);

There are 1 million rows in this table, with a compressed index on the date_id and tx_date (since it’s truncated to the day), as well as the following function-based indexes on the tx_date column:

Read the rest of this entry . . .

Oracle 11g: New Pivot Table Feature

I was very pleased when I heard about Oracle adding pivot functionality in select statements. Finally — we wouldn’t have to copy the data to a spreadsheet or code a ton of sum(case when col1 = 'X' then amount else 0 end) total_X for each column we would want to display.

I am basically looking for three things in a pivot-style query:

  1. the ability to specify which column will be pivoted as one or more columns in the resulting query
  2. row subtotals
  3. column subtotals

The first item is the only one that really matters. I can work around the other two, so let’s get started.

Read the rest of this entry . . .

Implementing Many-to-many Relationships in Data Warehousing

This article will discuss how to make many-to-many relationships in data warehousing easily queried by novice SQL users using point-and-click query tools.

This is a big problem with Oracle Discoverer-like tools where the metadata layer is basically a set of pre-joined tables from which the user simply clicks on columns and hits the run button. You can create custom complex queries that they can run, but then every query is custom, which defeats the purpose of the tool in the first place.

The design goal is to create a structure that is simple for the end user and which normally translates to something as flat as possible. This article will go through the different methods of implementing many-to-many relationships, and look at their effect on query complexity, especially for someone who use a tool that hides the SQL.

Read the rest of this entry . . .

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @pythiansimmons: Join @pythian's #Exadata webinar Aug 11. @fielding will share tips for implementation success http://bit.ly/exadata
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more