SQL Server: More light shed on "non-yielding scheduler" and Indexes stats
SQL server folks may find the "non-yielding scheduler" warning familiar and it is usually (if not always) associated with a stack dump. The "non-yielding scheduler" message comes with warnings like the following:
Process %ld:%ld:%ld (0x%lx) Worker 0x%p appears to be non-yielding on Scheduler %ld. Thread creation time: %time. Approx Thread CPU Used: kernel 0x ms, user 0x ms. Process Utilization 0x%. System Idle 0x%. Interval: 0x ms.
Understanding the Non-Yielding Scheduler Error
This means that there is a background thread that keeps running and making other worker threads yielding back, in time. There are many causes of the "non-yielding scheduler" error. Memory dump analysis is necessary to determine the cause. Look up the error in MS Support Advanced Search and you will find plenty of KBs dedicated to it. You may also find (not all cases) errors 17883, 17884 logged in the event viewer, containing information like this:
Event ID: 17884 Description: New queries assigned to process on Node 0x have not been picked up by a worker thread in the last 0x seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time.
Event ID: 17883 Description: Process Worker appears to be non-yielding on Scheduler . Thread creation time: . Approx Thread CPU Used: kernel ms, user ms. Process Utilization %. System Idle %. Interval: ms.
There is a must-read whitepaper describing SQL Server Scheduling and how to interpret and diagnose Errors 17883, 17884, 17887, and 17888; please look here: How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888.
Case Study: High Load and Index Usage Stats
Recently, we had a client having same issue. The client runs a busy online business with thousands of connected users; databases sometimes include more than half a million tables. We had a healthy counters and server didn't seem to suffer from memory or I/O bottlenecks. We did suspect, however, that it is something to do with SQL Server internals, as we also received fatal exceptions pertaining to SPIDs <50 (system processes).
Microsoft Support Findings
We opened a case with MS support after the dumps continued to happen and their reply was: After analyzing the dumps, we see that the thread has non-yielded while updating index usage stats. These stats are returned through DMV sys.dm_db_index_usage_stats. According to MS, the issue is a known bug that has been corrected in SQL Server 2008 but isn't expected to be resolved in SQL Server 2005.
Workaround: Implementing Trace Flag 2330
Enable trace flag 2330 to prevent the update of data in DMV in sys.dm_db_index_usage_stats. This DMV retrieves information about index usage; it does not, however, generate this information itself, but rather just returns this info from a cache inside SQL Server. The impact of disabling this job was nothing to the server operations.
Comparing Index DMVs
It is worthwhile to note that sys.dm_db_index_operational_stats can report similar statistics as sys.dm_db_index_physical_stats. Craig Freedman wrote a good blog on the differences between these views.
[Image comparing sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats data points]
Official Microsoft Guidance
Microsoft released a "FAST PUBLISH" KB about this issue which mentions the following scenario:
- In Microsoft SQL Server 2005, you frequently perform DDL operations that involve dropping and recreating of lot of tables (especially temporary tables).
- You have a large number of entries (100,000 or more) in the
sys.dm_db_index_usage_statsDMV.
This aligns with what we had with the affected server (more than half a million tables). The KB mentions these workarounds:
- Reduce the rate at which you create or drop temporary tables.
- Upgrade to SQL Server 2008.
- Configure your SQL Server to use trace flag T2330 as a startup parameter.
Final Verification and Recommendations
Note: make sure that the trace is enabled by running DBCC TRACESTATUS(-1). You can also look in the SQL Server error log and note whether the trace flags are mentioned:
Time Server -T1222Time Server -T845Time Server -T2330
If you face similar issues, you should install the latest service pack and cumulative update. You can try trace T2330 as well, and if you still get errors, open a case with Microsoft Support to analyze the dumps. Hope this helps!
SQL Server Database Consulting Services
Ready to future-proof your SQL Server investment?
Share this
Share this
More resources
Learn more about Pythian by reading the following blogs and articles.
Fix: SQL Server to PostgreSQL Linked Server Error - "Requested conversion is not supported."
Troubleshooting SSPI handshake error
Resolving an Availability Group Sync Issue - HADR_AR_CRITICAL_SECTION_ENTRY
Ready to unlock value from your data?
With Pythian, you can accomplish your data transformation goals and more.