<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Data Warehousing Best Practices:  Comparing Oracle to MySQL, part 1 (introduction and power)</title>
	<atom:link href="http://www.pythian.com/news/15157/data-warehousing-best-practices-comparing-oracle-to-mysql-part-1-introduction-and-power/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.pythian.com/news/15157/data-warehousing-best-practices-comparing-oracle-to-mysql-part-1-introduction-and-power/</link>
	<description>News and views from Pythian DBAs</description>
	<lastBuildDate>Fri, 10 Feb 2012 13:01:25 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
	<item>
		<title>By: Sheeri Cabral</title>
		<link>http://www.pythian.com/news/15157/data-warehousing-best-practices-comparing-oracle-to-mysql-part-1-introduction-and-power/#comment-448195</link>
		<dc:creator>Sheeri Cabral</dc:creator>
		<pubDate>Fri, 30 Jul 2010 17:58:58 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/news/?p=15157#comment-448195</guid>
		<description>Justin, Roland -- thank you!  You have done a very good job of showing how Oracle and MySQL are different.  This is what community is about, and why I have those caveats.</description>
		<content:encoded><![CDATA[<p>Justin, Roland &#8212; thank you!  You have done a very good job of showing how Oracle and MySQL are different.  This is what community is about, and why I have those caveats.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Justin Swanhart</title>
		<link>http://www.pythian.com/news/15157/data-warehousing-best-practices-comparing-oracle-to-mysql-part-1-introduction-and-power/#comment-448177</link>
		<dc:creator>Justin Swanhart</dc:creator>
		<pubDate>Fri, 30 Jul 2010 17:31:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/news/?p=15157#comment-448177</guid>
		<description>The three Ps aren&#039;t applicable to MySQL:

Power) All queries are single threaded, so the core speed can be more important than the number of cores 

Partitioning) MySQL partitioning only does partition elimination, and it doesn&#039;t do it very well.  MySQL 5.5 is supposed to make some improvements here.

Parallelism) In a single query, there is none.

I never suggest using a SAN for MySQL.  SAN are great for shared-everything databases (RAC) but not for shared-nothing ones like MySQL.  RAC has OCFS, a clustered filesystem designed to work with technology like SANS.

A star/snowflake schema always has one fact table, since by definition the schema is about one factual subject (like sales).  There can be dimensions that are shared between different fact tables (puppet dimensions) like the date/time dimensions, perhaps lists of customers, etc.

A snowflake schema contains one or more normalized dimension tables, that is, the hierarchical information isn&#039;t stored redundantly in a single table.  

An example of a dimension in a star schema might be:

create table dim_product (
  product_id int,
  category_name char(10),
  sub_category_name char(10),
  base_price decimal(6,3)
)

but in a snowflake:
create table dim_product (
  product_id int, 
  base_price decimal(6,3)
)

create table dim_product_category (
  product_id int,
  category_name char(10),
  category_type enum(&#039;top&#039;,&#039;sub&#039;)
)</description>
		<content:encoded><![CDATA[<p>The three Ps aren&#8217;t applicable to MySQL:</p>
<p>Power) All queries are single threaded, so the core speed can be more important than the number of cores </p>
<p>Partitioning) MySQL partitioning only does partition elimination, and it doesn&#8217;t do it very well.  MySQL 5.5 is supposed to make some improvements here.</p>
<p>Parallelism) In a single query, there is none.</p>
<p>I never suggest using a SAN for MySQL.  SAN are great for shared-everything databases (RAC) but not for shared-nothing ones like MySQL.  RAC has OCFS, a clustered filesystem designed to work with technology like SANS.</p>
<p>A star/snowflake schema always has one fact table, since by definition the schema is about one factual subject (like sales).  There can be dimensions that are shared between different fact tables (puppet dimensions) like the date/time dimensions, perhaps lists of customers, etc.</p>
<p>A snowflake schema contains one or more normalized dimension tables, that is, the hierarchical information isn&#8217;t stored redundantly in a single table.  </p>
<p>An example of a dimension in a star schema might be:</p>
<p>create table dim_product (<br />
  product_id int,<br />
  category_name char(10),<br />
  sub_category_name char(10),<br />
  base_price decimal(6,3)<br />
)</p>
<p>but in a snowflake:<br />
create table dim_product (<br />
  product_id int,<br />
  base_price decimal(6,3)<br />
)</p>
<p>create table dim_product_category (<br />
  product_id int,<br />
  category_name char(10),<br />
  category_type enum(&#8216;top&#8217;,'sub&#8217;)<br />
)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Roland Bouman</title>
		<link>http://www.pythian.com/news/15157/data-warehousing-best-practices-comparing-oracle-to-mysql-part-1-introduction-and-power/#comment-447955</link>
		<dc:creator>Roland Bouman</dc:creator>
		<pubDate>Fri, 30 Jul 2010 00:55:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/news/?p=15157#comment-447955</guid>
		<description>Hi! noticed 2 things:

#1
&quot;A star schema is a selfish model, used by a department, because it’s already got aggregation in it.&quot;

This sound like bollocks to me :)

While it is true that many people draw star schemas from a (more or less) 3NF normalized data warehouse as departmental data marts, aggregation is not an intrinsic property of star schemas at all. Aggregation may be added for performance reasons, but ideally the fact table has one row for the lowest possible level of interest - aggregation should be done as much as possible by the reporting and/or OLAP tool, so, at query time.

#2:
&quot;Another schema-related topic I had a hard time putting into words before this workshop was the difference between a star and a snowflake schema: compared to a star schema, in a snowflake schema, you have more than one fact table and maybe some dimensions that are not used often.&quot;

Star and snowflake schemas are both examples of a dimensional model. In a dimensional model, you have one fact table which contains columns for the metrics (aka measures), and columns that reference the dimension tables which provide the context for the metrics. While querying these dimensional models, the fact table is joined to one or more dimension tables, and the metric columns are typically aggregated, using several columns from the dimension tables for grouping. One datamart can contain multiple fact tables, and typically the fact tables in one datamart share (a number of) dimensions.

The difference between star and snowflakes is that in a star schema, each dimension is implemented in a single dimension table. The star schema dimension table is denormalized and contains all data for all levels of all hierarchies along which the dimension can be rolled up to get aggregated results from the fact table. 

In a snowflake, you still have a single fact table, which still points to the dimension tables. The only difference is that in a snowflake, dimensions are implemented using multiple tables, typically a single dimension table is used at each level of each hierarchy of the dimension.

Example: an order line fact table will have a link to an order date dimension (among many others). The order date dimension should allow users to view the data in the fact table at different levels of time: year, quarter, month, day are examples of such levels; another example would be year, week, weekday. 

Note that Y-Q-M-D and Y-W-D are two distinct an non-overlapping hierarchies of a single date dimension - two ways to partition time. 

Now in a star schema, the date dimension would have columns for all levels of all hierarchies (Y, Q, M, day-in-month, W, day-in-week). This is necessarily a denormalized table because of the functional dependencies between the levels of the hierarchies.

In a snowflake, we would still have the same fact table, the only difference would be that each level of each hierarchy would get it&#039;s own table. so, at the lowest leve of the dimension you&#039;d have one day table. The day table would point to a day-in-week table for that level of the Y-W-d hierachy, and also to a month table for the Y-Q-M-d hierarchy)

So basically, both are dimensional models but in a snowflake the dimension tables are normalized (at least each level gets its own table).</description>
		<content:encoded><![CDATA[<p>Hi! noticed 2 things:</p>
<p>#1<br />
&#8220;A star schema is a selfish model, used by a department, because it’s already got aggregation in it.&#8221;</p>
<p>This sound like bollocks to me :)</p>
<p>While it is true that many people draw star schemas from a (more or less) 3NF normalized data warehouse as departmental data marts, aggregation is not an intrinsic property of star schemas at all. Aggregation may be added for performance reasons, but ideally the fact table has one row for the lowest possible level of interest &#8211; aggregation should be done as much as possible by the reporting and/or OLAP tool, so, at query time.</p>
<p>#2:<br />
&#8220;Another schema-related topic I had a hard time putting into words before this workshop was the difference between a star and a snowflake schema: compared to a star schema, in a snowflake schema, you have more than one fact table and maybe some dimensions that are not used often.&#8221;</p>
<p>Star and snowflake schemas are both examples of a dimensional model. In a dimensional model, you have one fact table which contains columns for the metrics (aka measures), and columns that reference the dimension tables which provide the context for the metrics. While querying these dimensional models, the fact table is joined to one or more dimension tables, and the metric columns are typically aggregated, using several columns from the dimension tables for grouping. One datamart can contain multiple fact tables, and typically the fact tables in one datamart share (a number of) dimensions.</p>
<p>The difference between star and snowflakes is that in a star schema, each dimension is implemented in a single dimension table. The star schema dimension table is denormalized and contains all data for all levels of all hierarchies along which the dimension can be rolled up to get aggregated results from the fact table. </p>
<p>In a snowflake, you still have a single fact table, which still points to the dimension tables. The only difference is that in a snowflake, dimensions are implemented using multiple tables, typically a single dimension table is used at each level of each hierarchy of the dimension.</p>
<p>Example: an order line fact table will have a link to an order date dimension (among many others). The order date dimension should allow users to view the data in the fact table at different levels of time: year, quarter, month, day are examples of such levels; another example would be year, week, weekday. </p>
<p>Note that Y-Q-M-D and Y-W-D are two distinct an non-overlapping hierarchies of a single date dimension &#8211; two ways to partition time. </p>
<p>Now in a star schema, the date dimension would have columns for all levels of all hierarchies (Y, Q, M, day-in-month, W, day-in-week). This is necessarily a denormalized table because of the functional dependencies between the levels of the hierarchies.</p>
<p>In a snowflake, we would still have the same fact table, the only difference would be that each level of each hierarchy would get it&#8217;s own table. so, at the lowest leve of the dimension you&#8217;d have one day table. The day table would point to a day-in-week table for that level of the Y-W-d hierachy, and also to a month table for the Y-Q-M-d hierarchy)</p>
<p>So basically, both are dimensional models but in a snowflake the dimension tables are normalized (at least each level gets its own table).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Data Warehousing Best Practices: Comparing Oracle to MySQL, part 2 (partitioning) &#124; The Pythian Blog</title>
		<link>http://www.pythian.com/news/15157/data-warehousing-best-practices-comparing-oracle-to-mysql-part-1-introduction-and-power/#comment-447907</link>
		<dc:creator>Data Warehousing Best Practices: Comparing Oracle to MySQL, part 2 (partitioning) &#124; The Pythian Blog</dc:creator>
		<pubDate>Thu, 29 Jul 2010 21:00:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.pythian.com/news/?p=15157#comment-447907</guid>
		<description>[...] Data Warehousing Best Practices: Comparing Oracle to MySQL, part 1 (introduction and power) [...]</description>
		<content:encoded><![CDATA[<p>[...] Data Warehousing Best Practices: Comparing Oracle to MySQL, part 1 (introduction and power) [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>

