ORA-10173: Dynamic sampling time-out error
Recently on a heavily used and freshly upgraded 220.127.116.11 ware-house type database, we started seeing lots of ORA-10173 dumped into the alert log. The information out there on this error is somewhat sparse, and it is often linked to Tuning Advisor functionality. Since we're not running that advisor on this database, a little digging was in order. What I always do as a first step if I am confronted with an error where I am not certain why and exactly where it is raised, is to set up an error trap. In Oracle, this can be done by setting an errorstack event, like this: [code lang="sql"]alter session set events '10173 trace name errorstack level 1'; [/code] Please note: whenever possible you'd want to set this on the smallest scope possible - starting with the session level, eventually in a login trigger for multiple sessions, and only when all that fails, on the system level using the alter system statement. Once you have trapped an occurrence or two, you can disable it again by running: [code lang="sql"]alter session set events '10173 trace name errorstack off'; [/code] Upon activating this event, the next time around when this exception is raised Oracle will write a trace file to the diagnostics repository. Two two most prevalent pieces of information in the trace file are the current SQL query: [code lang="sql"]----- Current SQL Statement for this session (sql_id=anbp9r5n8ysu6) ----- SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") */ 1 AS C1 FROM (SELECT "INV"."CASE_UPC" "ITEM_1","INV"."WHS_NBR" "ITEM_2",SUBSTR("INV"."CASE_UPC",3,11) "ITEM_3" FROM "XX"."XX_WHS_INV" "INV","XX"."XX_CASE_DIM" "CD2","XX"."XX_WHS_DIM" "WD" WHERE "WD"."WHS_DESC" LIKE '%PEY%' AND "CD2"."WHS_NBR"="INV"."WHS_NBR" AND "CD2"."CASE_UPC"="INV"."CASE_UPC" AND "INV"."WHS_NBR"="WD"."WHS_NBR" GROUP BY "INV"."CASE_UPC","INV"."WHS_NBR",SUBSTR("INV"."CASE_UPC",3,11)) "VW_DIS_13") innerQuery [/code] And the stack trace, which looks something like this: [code lang="sql"]skdstdst()+29 call kgdsdst() 7FFFA592F860 ? 000000000 ? 7FFFA5912120 ? 7FFFA5912238 ? 7FFFA592FF18 ? 7FFFA592F810 ? ksedst()+112 call skdstdst() 7FFFA592F860 ? 000000000 ? 7FFFA5912120 ? 7FFFA5912238 ? 7FFFA592FF18 ? 7FFFA592F810 ? dbkedDefDump()+1153 call ksedst() 000000000 ? 000000000 ? 7FFFA5912120 ? 7FFFA5912238 ? 7FFFA592FF18 ? 7FFFA592F810 ? ksedmp()+26 call dbkedDefDump() 000000001 ? 000000000 ? 7FFFA5912120 ? 7FFFA5912238 ? 7FFFA592FF18 ? 7FFFA592F810 ? dbkdaKsdActDriver() call ksedmp() 000000001 ? 000000000 ? +880 7FFFA5912120 ? 7FFFA5912238 ? 7FFFA592FF18 ? 7FFFA592F810 ? dbgdaExecuteAction( call dbkdaKsdActDriver() 7F482FDE8618 ? 7FFFA5932830 ? )+319 7FFFA5912120 ? 7FFFA5912238 ? 7FFFA592FF18 ? 7FFFA592F810 ? ... ( rest omitted for clarity and we don't need the stack for the purpose of this post ) [/code] For my case, the interesting bit was the query that caused the error to be raised. This query provides us with some hints about what's going on: - We can see the DS_SVC comment at the beginning of the query - We can see lots of hints, the more interesting ones being dynamic_sampling(0) and result_cache(snapshot=3600) A little bit of research led to a new feature introduced in 12c: Dynamic Statistics (or Adaptive Statistics). This is an enhancement to dynamic sampling, where the optimizer can automatically choose to gather more statistics as part of the parse phase. The sampling is controlled internally with a time out, and if that time out is hit, ORA-10173 is raised to the alert log. This means that these errors are generally safe to ignore, as they are raised purely internally and your application won't see these exceptions. Your query didn't, and won't fail. However, your query may well be spending more time parsing and sampling data than what makes sense. My colleague Slava has already blogged about such a case here: Performance Problems with Dynamic Statistics in Oracle 12c and there are other documented cases. The feature can be controlled through the optimizer_dynamic_sampling parameter. If it's set to the default value of 2, the optimizer may choose to raise that temporarily to 11 which enables these new dynamic features. You can tell by looking at an execution plan of a query using dbms_xplan.display_cursor for example, and looking at the notes section:
Note ----- - dynamic statistics used: dynamic sampling (level=AUTO)As always, to determine whether or not this feature is helpful or harmful to your applications: benchmark it. You probably don't want to turn it off system-wide, but it may make a lot of sense to disable it for certain sessions, or certain queries in your application. The ORA-10173 you may see in the alert log however is no reason to panic, if your application is not negatively impacted. If you are mostly running queries that take minutes or longer to execute, spending a couple seconds gathering more statistics may not impact you. If on the other hand your queries are fast, and Oracle spends more time parsing and sampling than it actually takes to execute the query, or you are seeing loads and loads of ORA-10173, you may want to take action. Alternatively you may also resort to setting a _fix_control as mentioned in Slava's post - but as always, double-check with Oracle support if setting it has the desired effect on your exact version of Oracle. In essence, the ORA-10173 indicates wasted work, because Oracle is spending time gathering data, hits the timeout and then throws the partial data away since it's incomplete. This certainly isn't optimal, but again, your mileage may vary if this really impacts your applications. In our case, we are seeing thousands of them, and we have opened a case with Oracle, since obviously somewhere the optimizer keeps making the wrong choice. The bug is currently still with BDE but I shall keep you posted if anything develops. Another thing worth noting is that Oracle stores the results of these dynamic sampling queries in the result cache, and marks them for expiration after a certain time has elapsed. This timeout that we are seeing with the result_cache(snapshot=3600) hint, comes from the hidden parameter _optimizer_ads_result_cache_life which defaults to 3600 in 18.104.22.168. This may also help in reducing the frequency of the dynamic sampling queries by increasing the lifetime for which they remain cached. But as always, before changing any hidden parameters, please consult with your local Oracle support representative! Discover more about our expertise in the world of Oracle.