What are Statistics?
There are multiple paths a database can use to answer a query, some of them being faster and more efficient than others. It is the job of the query optimizer to evaluate and choose the best path, or execution plan, for a given query. Using the available indexes may not always be the most efficient plan. For example, if 95% of the values for a column are the same, an index scan will probably be more efficient than using the index on that column. Statistics are SQL Server objects which contain metrics on the data count and distribution within a column or columns used by the optimizer to help it make that choice. They are used to estimate the count of rows. Index statistics: Created automatically when an index (both clustered and non-clustered) is created. These will have the same name as the index and will exist as long as the index exists. Column statistics: Created manually by the DBA using the ‘CREATE STATISTICS’ command, or automatically if the “Auto Create Statistics” option is set to “True”. Column statistics can be created, modified and dropped at will. Statistics contain two different types of information about the data; density and distribution. Density is simply the inverse of the count of distinct values for the column or columns. The distribution is a representation of the data contained in the first column of the statistic. This information is stored in a histogram; the histogram contains up to 200 steps with a lower and upper limit and contains the count of values that fall between both limits. To view this histogram, go to the details tab of the statistic’s properties or use the command DBCC SHOW_STATISTICS. The screenshot below shows the histogram of an index statistic; the RANGE_HI_KEY is the upper limit of the step, the RANGE_HI_KEY of the previous step + 1 is the lower limit, and the RANGE_ROWS is the count of rows between the limits.![stats1](https://static.hsstatic.net/BlogImporterAssetsUI/ex/missing-image.png)
Statistics Maintenance and Best Practices
When the data in the database changes the statistics become stale and outdated. When examining a query execution plan, a large discrepancy between the Actual Number of Rows and the Estimated Number of Rows is an indication of outdated stats. Outdated statistics can lead the optimizer in choosing inefficient execution plan and can dramatically affect overall performance. Steps must therefore be taken in order to keep statistics up to date.![stats2](https://static.hsstatic.net/BlogImporterAssetsUI/ex/missing-image.png)
![stats3](https://static.hsstatic.net/BlogImporterAssetsUI/ex/missing-image.png)
Common Mistakes
Statistics update after Index Rebuild: As mentioned previously, the index rebuild (not reorg) will also update index statistics using full scan. Scheduling stats maintenance after the index maintenance will cause duplicate work. In addition, if the stats maintenance is using a small sample size, the new updated stats will overwrite the ones that were just updated with full scan, meaning their values will be less accurate. Scheduling it after an index reorg however is fine. Relying on Auto Update: As seen above, the threshold which triggers the auto update is around 20% of the total row count. This is fine for small tables, but larger tables require a lot of data changes before the update is triggered, during which the stats can become outdated. Not specifying the sample size: While updating, choosing the right sample size is important to keep statistics accurate. While the cost of using full scan is higher, in some situations it is required, especially for very large databases. Running EXEC sp_updatestats @resample = 'resample' will update all statistics using the last sample used. If you do not specify the resample, it will update them using the default sample. The default sample is determined by SQL server and is a fraction of the total row count in a table. We have recently run into an issue where a DBA executed “EXEC sp_updatestats” on a 1 terabyte database, which caused all statistics to be updated with the default sample. Due to the size of the database, the default sample is simply not enough to represent the data distribution in the database and caused all queries to use bad execution plans which caused major performance issues. Only a full scan update of the statistics provided accurate statistics for this database, but it takes a very long time to run. Luckily there was a QA server where the database was restored before the stats update and with almost identical data. We were able to script the statistics from the QA server and recreate them on production using their binary representation (see WITH STATS_STREAM). This solution is not recommended and was only used as a last resort. This incident shows the importance of statistics and implementing proper maintenance appropriate for the environment. Updating too often: Not only is there a cost in updating statistics, remember that it also causes queries to recompile. Updating statistics should be done only as required, and a schedule appropriate for your environment should be used. The frequency depends on the amount of data changes in the database, more changes require more frequent stats update.Conclusion
Statistics are a crucial element in the overall performance of a database and require proper maintenance and attention. In addition, each environment is unique and has different needs regarding statistics maintenance. For more information regarding statistics, see https://technet.microsoft.com/en-us/library/ms190397.aspx.Share this
You May Also Like
These Related Stories
Empowering Your Workforce with Oracle HCM Self-Service Modules
![](https://www.pythian.com/hubfs/Empowering%20Your%20Workforce%20with%20Oracle%20HCM%20Self-Service%20Modules-jpeg.jpeg)
Empowering Your Workforce with Oracle HCM Self-Service Modules
Nov 27, 2023
3
min read
How to Read Oracle Traces from SQL*Plus
How to Read Oracle Traces from SQL*Plus
Mar 4, 2020
2
min read
Conference review Percona Live Santa Clara 2018
Conference review Percona Live Santa Clara 2018
May 9, 2018
2
min read
No Comments Yet
Let us know what you think