THE WORLD DISCUSSES #PYTHIAN ON TWITTER. HAVE A QUESTION? USE OUR HASHTAG AND ASK AWAY.

Real Time Data Warehousing Presentation and Video

At the March Boston MySQL User Group meeting, Jacob Nikom of MIT’s Lincoln Laboratory presented “Optimizing Concurrent Storage and Retrieval Operations for Real-Time Surveillance Applications.” In the middle of the talk, Jacob said he sometimes calls what he did in this application as “real-time data warehousing”, which was so accurate I decided to give that title to this blog post.

The slides can be downloaded in PDF format (1.3 Mb) at http://www.technocation.org/files/doc/Concurrent_database_performance_02.pdf.

This talk discussed how to do real-time retrieval operations while doing concurrent high volume insertion, including:

  • How to keep up with 1.5 Mb/second per server incoming data stream
  • server hardware comparison between a multi-core AMD Opteron and a multi core Intel Xeon
  • MySQL/Postgres comparison
  • schema design
  • design of the storage/retrieval benchmark
  • tuning MySQL

Read the rest of this entry . . .

Date Dimension vs. Function-Based Indexes: Is the Date Dimension Still Relevant?

The purpose of this post is to verify if a date dimension is better in regards to performance and functionality than a series of function-based indexes on a date column in the fact table.

The following is the date dimension and fact table used for the testing:

Date Dimension

create table date_dimension(
date_id number, --1..x days
sql_date date, -- trunc(date)
day_of_week varchar2(30), -- (long version)
day_of_week_short varchar2(3), -- (short version)
day_number_in_week number, -- (1..7)
day_number_in_month number, --(1..31)
day_number_in_year number, --(1..366)
workday varchar2(10), --(either workday or holiday)
holiday varchar2(50),
  --(specify text e.g. christmas, easter...)
weekday varchar2(10), -- (specify weekday or weekend)
last_day_in_month number, -- (28,29,30 or 31)
week_number_in_year number , --1..52
week_number_in_month number, --1,5
month varchar2(10) ,--(January...)
month_short varchar2(3), -- (Jan...)
month_number_in_year number, -- (1..12)
quarter_in_year number, -- (1,2,3,4)
year number, -- (4 digits)
chinesezodiac varchar2(50),
zodiac varchar2(50) ,
event varchar2(100) -- (election day,Hurricane Hugo) text field
) ;

The are rows for every day from the year 1850 to 2050 (way after I retire). There is a primary key on date_id, a unique key on sql_date, and bitmap indexes on all other columns. The table is compressed and its stats computed.

Segment sizes:

WEEK_NUMBER_IN_YEAR_IDX	.125
MONTH_IDX	.0625
MONTH_SHORT_IDX	.0625
MONTH_NUMBER_IN_YEAR_IDX	.0625
QUARTER_IN_YEAR_IDX	.0625
YEAR_IDX	.0625
EVENT_IDX	.0625
WEEK_NUMBER_IN_MONTH_IDX	.125
CHINESEZODIAC_IDX	.0625
ZODIAC_IDX	.0625
WORKDAY_IDX	.0625
HOLIDAY_IDX	.0625
WEEKDAY_IDX	.0625
LAST_DAY_IN_MONTH_IDX	.0625
DATE_DIMENSION_SQL_DATE	2
DATE_DIMENSION_PK	2
DAY_OF_WEEK_IDX	.1875
DAY_OF_WEEK_SHORT_IDX	.1875
DAY_NUMBER_IN_WEEK_IDX	.1875
DAY_NUMBER_IN_MONTH_IDX	.1875
DAY_NUMBER_IN_YEAR_IDX	.3125

Fact Table

create table fact (id number, tx_date_dm_id number, tx_date date, amount number);

There are 1 million rows in this table, with a compressed index on the date_id and tx_date (since it’s truncated to the day), as well as the following function-based indexes on the tx_date column:

Read the rest of this entry . . .

Start NowWith Pythian - database design, management and emergency handling capabilities...

Live Updates

pythian: RT @sheeri: #confoo talk "Bending Queries to your Will with EXPLAIN" slides http://bit.ly/explainslides & handout
more



Testimonials

  • Serge Racine

    DBA, Brookfield Energy

    We are very satisfied by the service given to us by Andre and Shakir in support of our recent data quality and reorganization initiative.... more