How to Overcome the Top 10 SQL Server Performance Challenges
Working with consulting clients is rewarding, interesting, and challenging. In discussion with some of my colleagues, I’ve noticed the same performance challenges come up over and over again across clients with completely different hardware, industry verticals, levels of staff seniority, and more. I believe a few factors may be driving these issues including:
- How easy Microsoft makes working with SQL Server;
- The still-recent explosion of SQL Server as a fully mature enterprise-level product;
- Misguided advice on the internet;
- Or, a combination of the above.
1. You run an application that is mostly read activity (e.g. most OLTP) with insufficient indexesI had a case where a client had a large table with several hundred million records and every time a specific query ran, it would paralyze the entire system. Because this particular query had to run several times per day, this was a constant headache. We got the execution plan for this query and immediately saw it was being scanned because there wasn’t a proper index for it. When we suggested adding an index, the client was very hesitant because the table was so large. They worried about the performance degradation for the “Insert,” “Delete,” “Update” activity and the increase in disk space. The truth is, if your system is already paralyzed by this situation, it’s very unlikely that adding a few milliseconds to “Insert,” “Delete,” “Update” operations is worse than letting a query run without a proper index against a large table. We showed them how their system was read-heavy and convinced them to add the missing index. Problem solved.
2. You’re still running a lot of reporting directly on the operational copy of your databaseYou may think running OLTP queries and reporting queries on one system will save money and / or hardware, but you’ll end up with neither the OLTP nor the reporting working properly. By definition, reporting queries will usually touch many records, and by doing them in the same machine you end up backing yourself into these corners:
- You lock records; interrupting and slowing your OLTP-style queries.
- You end up putting NOLOCK hints everywhere and compromising the quality of your reporting.
- You activate snapshot isolation, and your IO system isn't ready for the extra tempdb load.
3. Your VM hosts are way overcommittedIf your VM cluster has become the SQL Server bucket where all workloads go, don’t be surprised when it doesn’t perform well. I had someone tell me their “SQL Server doesn’t virtualize well,” – of course it doesn’t; neither does any resource-intensive application when it’s running on overcommitted virtual resources. Here are the main guidelines I use:
- I don’t like overcommitting memory on the VM host. If the host has 128GB of RAM, I want all the VM’s memory to sum up to less than that amount, not more.
- Although CPU is a more flexible resource, I still don’t recommend overcommitting it by 20 to 25 percent.
- Many places have a DBA, a VM admin and a storage admin. The VM admin checks memory and CPU pools, but has a harder time figuring out if the IO limits are hit. So, make sure your storage admin is in the loop regarding your virtualization goals as well.
4. Your IO sucksThis one is pretty self-explanatory. I’ve seen environments where adding an all-flash array was like pressing the TURBO button. Suddenly the issues were about contention on spinlocks to squeeze the most out of the CPUs, instead of the same old PAGEIOLATCH from the slower storage. Create a performance monitor collection and get your average seconds per read and average seconds per write baselined. If your latency on read or write is consistently over 15 to 20 ms then your IO is slow. No ifs, no buts, even if the storage admin can’t see anything on the SAN. This will be your biggest bottleneck until you take care of it.
5. You have massive queries that are hard to optimizeWriting query optimizers is a heuristic, non-trivial effort that is simply HARD WORK. They also have to perform well most of the time to be taken seriously. As good as the SQL Server optimizer is, you, as a DBA or developer, can still help it out a lot of times. Here are some poor practices I still see too often:
- Endless nested views (a view calls a view that calls a view that calls a view ...).
- A lot of business logic and manipulation coded as single massive SQL statements. It’s easier to the optimizer if we feed it questions in more manageable chunks; don’t be afraid to break down a query and throw some intermediate results on a temp table if necessary.
- Queries that want to do everything and end up underperforming for all cases. Again, it’s OK to throw this inside a stored procedure and write different SQL statements for different branches of the procedure. With statement level compilation this is not a bad thing anymore.
6. Tempdb is not properly configuredThis one should be part of any DBA’s installation checklist.
- Add more tempdb files than the default one. I like to use a post-install rule of 0.25* logical processors up to a max of eight. Adjust up as needed if you see contention once you have an actual workload running.
- Pre-size them all, and set them all to the same size and autogrowth.
- If you’re running a version older than 2016 use these trace flags 1117 / 1118 for best performance. After 2016, the behaviors enabled by these flags are controlled through database (MIXED_PAGE_ALLOCATION = OFF) and filegroup settings (AUTOGROW_ALL_FILES = ON).
7. Instant file initialization is not set and your autogrowth settings are still at the defaultThis is something else that should be part of the installation checklist (or part of the installer, period). Instant file initialization allows SQL Server to grow the data files without having to zero them out, making file growth an *instant* metadata operation on Windows. To enable instant file initialization you need to grant the SQL Server service account the “Perform volume maintenance tasks” privilege. This option is now built into the installer after SQL 2016 so there really is no reason why it should not be turned on for 99 percent of cases. Related to this recommendation, you should also:
- Pre-grow your files to the best of your knowledge and monitor them so you can proactively grow them if necessary.
- Ensure instant file initialization is enabled so if autogrowth does end up triggering, it doesn’t suspend your session.
- Replace the awful 1MB to 10 percent defaults with a uniform increment that makes sense for your environment. For smaller databases I often use 100MB increments, for larger ones up to 4GB increments.