SQL Server: More light shed on "non-yielding scheduler"

Written by Pythian Marketing | Nov 12, 2009 5:00:00 AM

SQL server folks may find the "non-yielding scheduler" warning familiar and it is usually (if not always) associated with a stack dump. The "non-yielding scheduler" message comes with warnings like the following:

Process %ld:%ld:%ld (0x%lx) Worker 0x%p appears to be non-yielding on Scheduler %ld. Thread creation time: %time. Approx Thread CPU Used: kernel 0x ms, user 0x ms. Process Utilization 0x%. System Idle 0x%. Interval: 0x ms.

Understanding the Non-Yielding Scheduler Error

This means that there is a background thread that keeps running and making other worker threads yielding back, in time. There are many causes of the "non-yielding scheduler" error. Memory dump analysis is necessary to determine the cause. Look up the error in MS Support Advanced Search and you will find plenty of KBs dedicated to it. You may also find (not all cases) errors 17883, 17884 logged in the event viewer, containing information like this:

Event ID: 17884 Description: New queries assigned to process on Node 0x have not been picked up by a worker thread in the last 0x seconds. Blocking or long-running queries can contribute to this condition, and may degrade client response time.

Event ID: 17883 Description: Process Worker appears to be non-yielding on Scheduler . Thread creation time: . Approx Thread CPU Used: kernel ms, user ms. Process Utilization %. System Idle %. Interval: ms.

There is a must-read whitepaper describing SQL Server Scheduling and how to interpret and diagnose Errors 17883, 17884, 17887, and 17888; please look here: How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888.

Case Study: High Load and Index Usage Stats

Recently, we had a client having same issue. The client runs a busy online business with thousands of connected users; databases sometimes include more than half a million tables. We had a healthy counters and server didn't seem to suffer from memory or I/O bottlenecks. We did suspect, however, that it is something to do with SQL Server internals, as we also received fatal exceptions pertaining to SPIDs <50 (system processes).

Microsoft Support Findings

We opened a case with MS support after the dumps continued to happen and their reply was: After analyzing the dumps, we see that the thread has non-yielded while updating index usage stats. These stats are returned through DMV sys.dm_db_index_usage_stats. According to MS, the issue is a known bug that has been corrected in SQL Server 2008 but isn't expected to be resolved in SQL Server 2005.

Workaround: Implementing Trace Flag 2330

Enable trace flag 2330 to prevent the update of data in DMV in sys.dm_db_index_usage_stats. This DMV retrieves information about index usage; it does not, however, generate this information itself, but rather just returns this info from a cache inside SQL Server. The impact of disabling this job was nothing to the server operations.

Comparing Index DMVs

It is worthwhile to note that sys.dm_db_index_operational_stats can report similar statistics as sys.dm_db_index_physical_stats. Craig Freedman wrote a good blog on the differences between these views.

[Image comparing sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats data points]

Official Microsoft Guidance

Microsoft released a "FAST PUBLISH" KB about this issue which mentions the following scenario:

In Microsoft SQL Server 2005, you frequently perform DDL operations that involve dropping and recreating of lot of tables (especially temporary tables).
You have a large number of entries (100,000 or more) in the sys.dm_db_index_usage_stats DMV.

This aligns with what we had with the affected server (more than half a million tables). The KB mentions these workarounds:

Reduce the rate at which you create or drop temporary tables.
Upgrade to SQL Server 2008.
Configure your SQL Server to use trace flag T2330 as a startup parameter.

Final Verification and Recommendations

Note: make sure that the trace is enabled by running DBCC TRACESTATUS(-1). You can also look in the SQL Server error log and note whether the trace flags are mentioned:

Time Server -T1222
Time Server -T845
Time Server -T2330

If you face similar issues, you should install the latest service pack and cumulative update. You can try trace T2330 as well, and if you still get errors, open a case with Microsoft Support to analyze the dumps. Hope this helps!

SQL Server Database Consulting Services

Ready to future-proof your SQL Server investment?

View full post