Pythian Blog: Technical Track

Column Store Indexes in SQL Server 2012

Introduction:

We will discuss Column Store Index and will try to understand the performance improvements of the same query by multiple times using column store index.

What is Column store Index:

Column Store Index is a feature introduced in SQL Server 2012 which helps us improve performance of data warehouse queries. It uses the VertiPaq, which stores data in a per column manner rather than classical row storage thus the Column Store index is stored using columnar data format whereas the traditional indexes and heaps, which store data in B-Tree structure.

How does it works:

RowStore and Column Store are two types of storage available in the SQL Server database. In RowStore, data rows are placed sequentially on a page while in Column Store it stores all the data in a column on the same page. The data search speed is increased when the query searching the Colum data instead of all the row data. Again this is query specific scenario. Below diagram is an example of Table with four columns. In case of Rowstore Index ,the Row data is stored contiguously on the same page. Whereas In case of Column Store Index the column values of each columns are clubbed in separate pages.

SQL2012_SetupFailure

It is clear from above picture that the retrieval/Insertion of the data in the columns would be faster compared to row level retrieval/update. For example from above table if I want to select column1 and column2 from the table then the sql engine will have to read all the rows in traditional Row store Index, However In case of Column store Index we will only pull the pages related to column1 and column2 into the memory thus reducing overall effort for data retrieval. Additionally Column store Index is automatically compressed and is more efficient then PAGE and ROW Compression in terms of I/O savings. The level of data compression is higher then ROW or PAGE compression and since this is handled by the engine completely so developer do not have control on level of compression here. MetaData about Columstore Index:

There are three new catalog views you can use to gather information about column store indexes:

  • sys.column_store_index_stats
  • sys.column_store_segments
  • sys.column_store_dictionaries

Few important consideration:

Every feature has its own limitations thus Column store have some limitations too. I am trying to list down some of them.

  • Column store index makes a table Read Only. (Note:SQL Server 2014 will feature updateable ColumnStore indexes)
  • Column store index cannot be created on a view or indexed view.Only one Column store Index is allowed.
  • Column store index cannot have more than 1024 columns.It cannot include Sparse Column.
  • Column store index cannot be unique or filtered and cannot be used as primary or foreign key.
  • Column store index cannot be clustered, only NonClustered ColumnStore indexes are available in SQL 2012.(Note:SQL Server 2014 will allow clustered column store index too.)
  • Column store index cannot be created using ASC/DESC or INCLUDE keywords.
  • Column store index definition cannot be changed using ALTER INDEX statment. We need to drop and re-create instead.
  • Column store index do not support the data types as mentioned below
    • binary and varbinary
    • ntext , text, and image
    • varchar(max) and nvarchar(max)
    • uniqueidentifier
    • rowversion (and timestamp)
    • sql_variant
    • decimal (and numeric) with precision greater than 18 digits
    • datetimeoffset with scale greater than 2
    • CLR types (hierarchyid and spatial types)
    • xml
    • Column store indexes cannot be combined with the following features:
      • Page and row compression, and vardecimal storage format (A columnstore index is already compressed in a different format.)
      • Replication
      • Change tracking
      • Change data capture
      • Filestream

Column store Index Demo:

Let us try to understand the benefits of Column Store Index. I will create and empty database and create two tables. One table with clustered index and another one with column store index .In this demonstration we will use table [AdventureWorksDW].dbo.[FactInternetSales] with 60398 Rows as big fact tables are good candidate.

Created empty database ColumstoreDemo and used below script to create two tables. I have added one more dummy column named DummyIdentity to avoid PK violation during data load.

USE [ColumstoreDemo]
GO CREATE TABLE [dbo].[FactInternetSalesNonClusteredIDX](
[DummyIdentity] [int] IDENTITY(1,1) NOT NULL,
[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[DueDateKey] [int] NOT NULL,
[ShipDateKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[SalesTerritoryKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[RevisionNumber] [tinyint] NULL,
[OrderQuantity] [smallint] NULL,
[UnitPrice] [money] NULL,
[ExtendedAmount] [money] NULL,
[UnitPriceDiscountPct] [float] NULL,
[DiscountAmount] [float] NULL,
[ProductStandardCost] [money] NULL,
[TotalProductCost] [money] NULL,
[SalesAmount] [money] NULL,
[TaxAmt] [money] NULL,
[Freight] [money] NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[CustomerPONumber] [nvarchar](25) NULL,
CONSTRAINT [PK_FactInternetSalesClusteredIDX_ProductKey_OrderDateKey]
PRIMARY KEY CLUSTERED
(
[DummyIdentity] ASC,
[ProductKey] ASC
)) ON [PRIMARY]
GO
USE [ColumstoreDemo]
GO
CREATE TABLE [dbo].[FactInternetSalesColumnStoreIDX](

 [DummyIdentity] [int] IDENTITY(1,1) NOT NULL,

[ProductKey] [int] NOT NULL,
[OrderDateKey] [int] NOT NULL,
[DueDateKey] [int] NOT NULL,
[ShipDateKey] [int] NOT NULL,
[CustomerKey] [int] NOT NULL,
[PromotionKey] [int] NOT NULL,
[CurrencyKey] [int] NOT NULL,
[SalesTerritoryKey] [int] NOT NULL,
[SalesOrderNumber] [nvarchar](20) NOT NULL,
[SalesOrderLineNumber] [tinyint] NOT NULL,
[RevisionNumber] [tinyint] NULL,
[OrderQuantity] [smallint] NULL,
[UnitPrice] [money] NULL,
[ExtendedAmount] [money] NULL,
[UnitPriceDiscountPct] [float] NULL,
[DiscountAmount] [float] NULL,
[ProductStandardCost] [money] NULL,
[TotalProductCost] [money] NULL,
[SalesAmount] [money] NULL,
[TaxAmt] [money] NULL,
[Freight] [money] NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[CustomerPONumber] [nvarchar](25) NULL,
CONSTRAINT [PK_FactInternetSalesColumnStoreIDX_ProductKey_OrderDateKey]
PRIMARY KEY CLUSTERED (
[DummyIdentity] ASC,
[ProductKey] ASC ))
ON [PRIMARY]


Since the tables are ready so now Load these two tables with 50 times more data than [AdventureWorksDW][dbo][ FactInternetSales]

INSERT INTO FactInternetSalesNonClusteredIDX
(
ProductKey, OrderDateKey,DueDateKey, ShipDateKey,CustomerKey,PromotionKey,CurrencyKey,SalesTerritoryKey,SalesOrderNumber,SalesOrderLineNumber,RevisionNumber,OrderQuantity,UnitPrice,ExtendedAmount,UnitPriceDiscountPct,DiscountAmount,ProductStandardCost,TotalProductCost,SalesAmount,TaxAmt,Freight, CarrierTrackingNumber,CustomerPONumber
)
SELECT
ProductKey,OrderDateKey,DueDateKey, ShipDateKey,CustomerKey,PromotionKey,CurrencyKey,SalesTerritoryKey,SalesOrderNumber,SalesOrderLineNumber,RevisionNumber,OrderQuantity,UnitPrice,ExtendedAmount,UnitPriceDiscountPct,DiscountAmount,ProductStandardCost,TotalProductCost,SalesAmount,TaxAmt,Freight,CarrierTrackingNumber,CustomerPONumber

FROM [AdventureWorksDW].dbo.[FactInternetSales]

GO 50

INSERT INTO FactInternetSalesColumnStoreIDX
(
ProductKey, OrderDateKey,DueDateKey, ShipDateKey,CustomerKey,PromotionKey,CurrencyKey,SalesTerritoryKey,SalesOrderNumber,SalesOrderLineNumber,RevisionNumber,OrderQuantity,UnitPrice,ExtendedAmount,UnitPriceDiscountPct,DiscountAmount,ProductStandardCost,TotalProductCost,SalesAmount,TaxAmt,Freight, CarrierTrackingNumber,CustomerPONumber )

SELECT
ProductKey,OrderDateKey,DueDateKey, ShipDateKey,CustomerKey,PromotionKey,CurrencyKey,SalesTerritoryKey,SalesOrderNumber,SalesOrderLineNumber,RevisionNumber,OrderQuantity,UnitPrice,ExtendedAmount,UnitPricDiscountPct,DiscountAmount,ProductStandardCost,TotalProductCost,SalesAmount,TaxAmt,Freight,CarrierTrackingNumber,CustomerPONumber

FROM [AdventureWorksDW].dbo.[FactInternetSales]

GO 50

The tables are ready with the data uploaded to it.Created indexes as mentioned below

CREATE NONCLUSTERED INDEX [NC_FactInternetSalesNonClusteredIDX_ProductKey_Salesamount]

ON FactInternetSalesNonClusteredIDX (ProductKey,Salesamount)

CREATE NONCLUSTERED COLUMNSTORE INDEX [CS_FactInternetSalesColumnStoreIDX_ProductKey_Salesamount]

ON FactInternetSalesColumnStoreIDX (ProductKey,Salesamount)

Now we have indexes in place, let us execute below query on both the tables and check the execution time.

SET STATISTICS IO ON

SET STATISTICS TIME ON

GO Select ProductKey,AVG(Salesamount)

from FactInternetSalesNonClusteredIDX

GROUP BY ProductKey

ORDER BY ProductKey

GO

SQL Server parse and compile time:
CPU time = 16 ms, elapsed time = 741 ms. (158 row(s) affected)

Table 'FactInternetSalesNonClusteredIDX'. Scan count 5, logical reads 8350, physical reads 1933, read-ahead reads 8258, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)

SQL Server Execution Times:
CPU time = 3260 ms, elapsed time = 12853 ms.

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms. SQL

Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

SET STATISTICS IO ON

SET STATISTICS TIME ON

GO

Select ProductKey,AVG(Salesamount)

from FactInternetSalesColumnStoreIDX

GROUP BY ProductKey

ORDER BY ProductKey

GO

SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 85 ms. (158 row(s) affected)

Table 'FactInternetSalesColumnStoreIDX'. Scan count 4, logical reads 57, physical reads 5, read-ahead reads 18, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)

SQL Server Execution Times: CPU time = 77 ms, elapsed time = 493 ms.

SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

From above results it is clear that this query performed far better with Column store index in terms of CPU Cycles and total execution time.

Conclusion:

Column Store Index improves performance of data warehouse queries but not OLTP workloads. This is best suited for queries scanning and aggregating large amount of data with huge fact tables in star schema. Use it when your workloads are mostly read-only and allows you to drop and re-create the index or switch out partitions regularly.

In my next article We will explore about Column store index in SQL Server 2014.

Reference: https://technet.microsoft.com/en-us/library/gg492088.aspx

No Comments Yet

Let us know what you think

Subscribe by email