Author Archive

Simple MySQL Auditing

By Robert Hamel June 11th, 2008 at 3:28 pm
Posted in MySQL
Tags:

Simple auditing, i.e., knowing what changed recently, can save you tons of time while troubleshooting.

I know that, in the ideal world:

  • Everything is supposed to be done through configuration management.
  • Everything is documented and all changes are tracked through a VCS.
  • Every DDL or set global is trapped via MySQL Proxy and logged.

But there are always ways to bypass the gatekeepers. Changes can go in unnoticed. An hour or so later, your database performance suddenly changes for the worse, and you get that phone call.

First you check if anything caused an actual error. You look around at a few log files and nothing shows up. The next thing you ask yourself is, did someone change anything in the last little while. Of course, everybody says no. After a few hours of digging, comparing schemas, diff-ing old and current config files, you actually find what has changed, put it back the way it was and everything is back to normal. You ask the question again, did anybody change the global variable from X to Y, and someone finally admits it. But they thought that it was not relevant since they did that change on the weekend and the system performance didn’t go down the toilet until Monday morning around 9:00am. Sound familiar?

We have all gone through this many times, and it doesn’t matter what process you have in place — something always slips through the cracks.

I came up with some simple stored procedure that will compare two data sets and keep track of the changes historically. It’s loosely based on slowly-changing dimension type 2 in the data warehouse world (google Ralph Kimball if you want to get all the gory details). This method tracks only changes, so you should be able to keep historical rows forever, unless you are constantly dropping and creating whatever you are tracking.

Say you want to keep track of my.cnf or global variables changes. This one of the simplest tables to track, the information_schema — it has only two columns. Others, such as tables or routines, would make the SQL to compare a little more involved.

First you need a table to track this with. I called mine historical_global_variable. It has the two columns from the original table plus three additional ones:

(more…)

Date Dimension vs. Function-Based Indexes: Is the Date Dimension Still Relevant?

By Robert Hamel June 3rd, 2008 at 11:56 am
Posted in Oracle
Tags:

The purpose of this post is to verify if a date dimension is better in regards to performance and functionality than a series of function-based indexes on a date column in the fact table.

The following is the date dimension and fact table used for the testing:

Date Dimension

create table date_dimension(
date_id number, --1..x days
sql_date date, -- trunc(date)
day_of_week varchar2(30), -- (long version)
day_of_week_short varchar2(3), -- (short version)
day_number_in_week number, -- (1..7)
day_number_in_month number, --(1..31)
day_number_in_year number, --(1..366)
workday varchar2(10), --(either workday or holiday)
holiday varchar2(50),
  --(specify text e.g. christmas, easter...)
weekday varchar2(10), -- (specify weekday or weekend)
last_day_in_month number, -- (28,29,30 or 31)
week_number_in_year number , --1..52
week_number_in_month number, --1,5
month varchar2(10) ,--(January...)
month_short varchar2(3), -- (Jan...)
month_number_in_year number, -- (1..12)
quarter_in_year number, -- (1,2,3,4)
year number, -- (4 digits)
chinesezodiac varchar2(50),
zodiac varchar2(50) ,
event varchar2(100) -- (election day,Hurricane Hugo) text field
) ;

The are rows for every day from the year 1850 to 2050 (way after I retire). There is a primary key on date_id, a unique key on sql_date, and bitmap indexes on all other columns. The table is compressed and its stats computed.

Segment sizes:

WEEK_NUMBER_IN_YEAR_IDX	.125
MONTH_IDX	.0625
MONTH_SHORT_IDX	.0625
MONTH_NUMBER_IN_YEAR_IDX	.0625
QUARTER_IN_YEAR_IDX	.0625
YEAR_IDX	.0625
EVENT_IDX	.0625
WEEK_NUMBER_IN_MONTH_IDX	.125
CHINESEZODIAC_IDX	.0625
ZODIAC_IDX	.0625
WORKDAY_IDX	.0625
HOLIDAY_IDX	.0625
WEEKDAY_IDX	.0625
LAST_DAY_IN_MONTH_IDX	.0625
DATE_DIMENSION_SQL_DATE	2
DATE_DIMENSION_PK	2
DAY_OF_WEEK_IDX	.1875
DAY_OF_WEEK_SHORT_IDX	.1875
DAY_NUMBER_IN_WEEK_IDX	.1875
DAY_NUMBER_IN_MONTH_IDX	.1875
DAY_NUMBER_IN_YEAR_IDX	.3125

Fact Table

create table fact (id number, tx_date_dm_id number, tx_date date, amount number);

There are 1 million rows in this table, with a compressed index on the date_id and tx_date (since it’s truncated to the day), as well as the following function-based indexes on the tx_date column:

(more…)

Oracle 11g: New Pivot Table Feature

By Robert Hamel October 10th, 2007 at 1:00 pm
Posted in Group Blog PostsOracle
Tags:

I was very pleased when I heard about Oracle adding pivot functionality in select statements. Finally — we wouldn’t have to copy the data to a spreadsheet or code a ton of sum(case when col1 = 'X' then amount else 0 end) total_X for each column we would want to display.

I am basically looking for three things in a pivot-style query:

  1. the ability to specify which column will be pivoted as one or more columns in the resulting query
  2. row subtotals
  3. column subtotals

The first item is the only one that really matters. I can work around the other two, so let’s get started.

(more…)

Implementing Many-to-many Relationships in Data Warehousing

By Robert Hamel January 19th, 2007 at 3:24 pm
Posted in Group Blog PostsOracle

This article will discuss how to make many–to–many relationships in data warehousing easily queried by novice SQL users using point–and–click query tools.

This is a big problem with Oracle Discoverer-like tools where the metadata layer is basically a set of pre-joined tables from which the user simply clicks on columns and hits the run button. You can create custom complex queries that they can run, but then every query is custom, which defeats the purpose of the tool in the first place.

The design goal is to create a structure that is simple for the end user and which normally translates to something as flat as possible. This article will go through the different methods of implementing many-to-many relationships, and look at their effect on query complexity, especially for someone who use a tool that hides the SQL.

(more…)