SQL Server Statistics: Maintenance and Best Practices

What are Statistics?
There are multiple paths a database can use to answer a query, some of them being faster and more efficient than others. It is the job of the query optimizer to evaluate and choose the best path, or execution plan, for a given query. Using the available indexes may not always be the most efficient plan. For example, if 95% of the values for a column are the same, an index scan will probably be more efficient than using the index on that column. Statistics are SQL Server objects which contain metrics on the data count and distribution within a column or columns used by the optimizer to help it make that choice. They are used to estimate the count of rows. Index statistics: Created automatically when an index (both clustered and non-clustered) is created. These will have the same name as the index and will exist as long as the index exists. Column statistics: Created manually by the DBA using the ‘CREATE STATISTICS’ command, or automatically if the “Auto Create Statistics” option is set to “True”. Column statistics can be created, modified and dropped at will. Statistics contain two different types of information about the data; density and distribution. Density is simply the inverse of the count of distinct values for the column or columns. The distribution is a representation of the data contained in the first column of the statistic. This information is stored in a histogram; the histogram contains up to 200 steps with a lower and upper limit and contains the count of values that fall between both limits. To view this histogram, go to the details tab of the statistic’s properties or use the command DBCC SHOW_STATISTICS. The screenshot below shows the histogram of an index statistic; the RANGE_HI_KEY is the upper limit of the step, the RANGE_HI_KEY of the previous step + 1 is the lower limit, and the RANGE_ROWS is the count of rows between the limits.
Statistics Maintenance and Best Practices
When the data in the database changes the statistics become stale and outdated. When examining a query execution plan, a large discrepancy between the Actual Number of Rows and the Estimated Number of Rows is an indication of outdated stats. Outdated statistics can lead the optimizer in choosing inefficient execution plan and can dramatically affect overall performance. Steps must therefore be taken in order to keep statistics up to date.
