Sleuthing for Temporary Space Spendthrifts

Dec 20, 2012 / By Marc Fielding

Tags: , , ,

I was recently looking at an issue where a large database server was running out of temp space. The issue had occurred in the past, so the goal was to identify what had been causing the system to run out of temp. Although ORA-1555 messages will appear in the database alert log (with query text) when a query runs out of temporary space, they only talk about the query that was “unlucky” enough to be attempting to get space when the tablespace was full, not who was using the temp space.

A quick web search shows that most people seem to be querying current tables. Coskan Gundogar blogged about a historical monitoring script he wrote but only drilled down into a specific session and serial number.

A quick look at AWR showed that v$tempseg_usage had no direct historical view. There is dba_hist_tbspc_usage info for the temp tablespace, but it only shows aggregate usage.

However, dba_hist_active_sess_history, the AWR samples of active session history data, has a new column in 11gR2: TEMP_SPACE_ALLOCATED. From the docs:

Amount of TEMP memory (in bytes) consumed by this session at the time this sample was taken.

Having temp usage in a table like ASH is actually much more useful than a v$tempseg_usage capture since it lets us drill down not only to the query, but also to the exact SQL plan and plan step where the temp space is being used.

Querying out the data requires a bit of elbow grease though: There is no delta column like for the I/O and time model stats, so we need to use some fancy analytic functions to compute the delta values ourselves.

dba_hist_active_sess_history itself is a view composed of multiple joins on the back end, so it takes quite a while to query. I generally create a fixed version of the table with the timestamp I want to analyze, in order to speed up the queries later. And since this is an Exadata system, I’ll take advantage of hybrid columnar compression. Adjust the begin_interval_time range to match the date range you’re looking at.

FROM   dba_hist_active_sess_history
WHERE  dbid = (SELECT dbid
               FROM   v$database)
       AND snap_id IN (SELECT snap_id
                       FROM   dba_hist_snapshot
                       WHERE  dbid = (SELECT dbid
                                      FROM   v$database)
                              AND begin_interval_time BETWEEN To_date(
                                  '20-DEC-2012 05:00',
                                  'dd-mon-yyyy hh24:mi')
                                  '20-DEC-2012 10:00', 'dd-mon-yyyy hh24:mi'));

(Side note: I’m trying out an online SQL beautifier tool. Readers, do you find this format easier to read?)

Once the base table is created, I can query it:
(Thanks to Randolf Geist for pointing out a missing grouping for parallel execution)

                 over (
                   ORDER BY SUM(Nvl(temp_space_delta, 0)) DESC) position,
               Count(DISTINCT sql_exec_id)                      total_execs,
               Trunc(SUM(Nvl(temp_space_delta, 0)) / 1024 / 1024)
               ||'m'                                            temp_usage
        FROM   (SELECT sql_exec_id,
                       temp_space_allocated - Nvl(Lag(temp_space_allocated, 1)
                       over (
                         PARTITION BY sql_exec_id, sql_id, session_id
                         ORDER BY sample_id), 0)
                FROM   marc_ash)
        GROUP  BY sql_id,
WHERE  position <= 10
ORDER  BY position;

A bit of explanation is in order: The core of the query selects from marc_ash using the lag() analytic function to get the previous temp space allocation for the same execution, SQL statement, and parallel query process, and subtracts it to get the difference in temp space usage for the current sample. I then take this result, plus some information about the SQL step executing, and group it by SQL ID, plan operation, and line ID. And finally, I sort the results, keeping the top 10, and apply some formatting to the result.

And as is usual with ASH, since I’m only looking at 10-second samples of data, the results don’t reflect 100% what really happened. Though the longer the execution step runs, the more accurate the end results will be.

Here’s a sample result (with a bit of cleanup of the column headers):

------- --------------- --------------- --------------- --------------- --------------- ----------
1       a9hrk7g2ahbtk   1062456949      HASH JOIN       15              1               142607m
2       b40ahcxu8nxdq   2741224596      HASH JOIN       21              1               139959m
3       b62gn3yqyau4k   3673174370      SORT            1               4               92378m
4       5rf3btrz44d6v   3879859951      SORT            3               1               35128m
5       5rf3btrz44d6v   3879859951      SORT            6               1               34095m
6       8ucjdxhxwz93g   2599798119      HASH            15              1               10873m
7       901mxasy6gu2j   801798936       SORT            2               28              6268m
8       4b7jatzjaf4z3   270793325       SORT            3               3               5876m
9       0h5cvsbs92cyp   1935762178      SORT            1               3               4522m
10      84z212am4j987   3445415588      SORT            1               2               4182m

During this time period, there was high temp usage from the two plan steps of SQL ID 5rf3btrz44d6v. Even though it only ran once, it consumed close to 70G of temp space during this period. The benefit of using ASH data, though, is that I not only know which SQL statement is responsible, but I also have an idea of which plan steps are responsible: two sort operations.

We can plug in the SQL ID and plan hash value in the output to dbms_xplan.display_awr to see the plan itself. The results show the full query text as well as the execution pan involved.

A short snippet of the plan:

| Id  | Operation                                         | Name          | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
|   0 | SELECT STATEMENT                                  |               |       |       |       |   100K(100)|          |       |       |        |      |            |
|   1 |  PX COORDINATOR                                   |               |       |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (ORDER)                              | :TQ10009      |    76M|    18G|       |   100K  (1)| 00:20:04 |       |       |  Q1,09 | P->S | QC (ORDER) |
|   3 |    SORT ORDER BY                                  |               |    76M|    18G|    21G|   100K  (1)| 00:20:04 |       |       |  Q1,09 | PCWP |            |
|   4 |     PX RECEIVE                                    |               |    76M|    18G|       |   100K  (1)| 00:20:04 |       |       |  Q1,09 | PCWP |            |
|   5 |      PX SEND RANGE                                | :TQ10008      |    76M|    18G|       |   100K  (1)| 00:20:04 |       |       |  Q1,08 | P->P | RANGE      |
|   6 |       SORT GROUP BY                               |               |    76M|    18G|    21G|   100K  (1)| 00:20:04 |       |       |  Q1,08 | PCWP |            |
|   7 |        PX RECEIVE                                 |               |    76M|    18G|       |   100K  (1)| 00:20:04 |       |       |  Q1,08 | PCWP |            |
|   8 |         PX SEND HASH                              | :TQ10007      |    76M|    18G|       |   100K  (1)| 00:20:04 |       |       |  Q1,07 | P->P | HASH       |
|   9 |          HASH GROUP BY                            |               |    76M|    18G|    21G|   100K  (1)| 00:20:04 |       |       |  Q1,07 | PCWP |            |
|  10 |           VIEW                                    | VW_DAG_0      |    76M|    18G|       | 67638   (1)| 00:13:32 |       |       |  Q1,07 | PCWP |            |
|  11 |            HASH GROUP BY                          |               |    76M|    16G|    17G| 67638   (1)| 00:13:32 |       |       |  Q1,07 | PCWP |            |
|  12 |             HASH JOIN                             |               |    76M|    16G|       | 37552   (2)| 00:07:31 |       |       |  Q1,07 | PCWP |            |
|  13 |              BUFFER SORT                          |               |       |       |       |            |          |       |       |  Q1,07 | PCWC |            |
|  14 |               PX RECEIVE                          |               |  2709K|    67M|       |  5287   (1)| 00:01:04 |       |       |  Q1,07 | PCWP |            |
|  15 |                PX SEND HASH                       | :TQ10004      |  2709K|    67M|       |  5287   (1)| 00:01:04 |       |       |        | S->P | HASH       |
|  16 |                 TABLE ACCESS STORAGE FULL         | TAB1          |  2709K|    67M|       |  5287   (1)| 00:01:04 |       |       |        |      |            |

A very quick look at this execution plan seems to indicate that it’s returning an absolutely massive dataset to the database client and sorting it twice. It would benefit from more conditions to restrict results to what the client is actually looking for.

Any comments, readers? How are you monitoring for temp space usage? Anyone using resource management to control temp usage?

5 Responses to “Sleuthing for Temporary Space Spendthrifts”

  • David Mann says:

    I always wanted to explore an AFTER SERVERERROR trigger to see if there was anything useful I could glean when a ORA-01555 fires.

    I like your approach though, I will give it a shot.

  • Hi Marc,

    it’s good to see someone writing about this new ASH feature. A few comments:

    1. I think your query using the LAG function lacks a bit of precision: In case of a Parallel Execution there will be multiple samples per SAMPLE_ID (one per sampled Parallel Slave), so your ORDER BY SAMPLE_ID of the LAG function is not deterministic for Parallel Executions I think.

    2. This also makes the approach of building a delta between samples questionable I think, at least for Parallel Execution. After all it is based on sampling – what if the particular session being sampled doesn’t show up afterwards, or only 100 samples later having released the previous TEMP allocation but allocating TEMP for a completely different part of the execution plan? The question here is probably, what kind of problem do you want to address using your query?

    3. I don’t think that you can break down the TEMP_SPACE_ALLOCATED information from ASH on SQL plan line operation level. You can see this from your example where “PX RECEIVE” gets TEMP space allocated.

    But a “PX RECEIVE” operation doesn’t have a workarea assigned, so why should it show up?

    I believe the PGA and TEMP allocation in ASH data is not on operation level, but on session level, so represents the total aggregate for all active workareas of that SQL execution at the moment of sampling.

    This is different from Real-Time SQL Monitoring and Rowsource Statistics where you get the allocation data on workarea level.

    It looks more like the ASH data is taken from V$PROCESS for PGA and V$TEMPSEG_USAGE for TEMP.

    4. I do similar analysis in my XPLAN_ASH tool and I can show there that the TEMP space information from ASH is unreliable and quite often falls short of the actual TEMP space allocation reported by Rowsource Statistics or Real-Time SQL Monitoring, so the information has to be treated carefully and might be misleading (or the way I treat the data in XPLAN_ASH is misleading, but at present I don’t think so).


    • Hello Randolf,

      Thanks for the insightful comments. I went back to my original working version of the query, and I had a “PARTITION BY sql_id, sql_exec_id, session_id” clause in it, and somehow lost the session_id part in the final version. Like you say, the temp space counts are independent for each PX process, and I’ve corrected the query. It also now shows the usage in hashing and sorting operations, which does make a lot more sense than PX receive.

      As far as the accuracy of the data, I would expect that it would be driven off the session statistic “temp space allocated (bytes)”. A test on one (otherwise idle) 11gR2 system:

      – Temp space consumer

      select count(distinct id) from big_table where rownum < 10000000;

      -- While running, in another session

      select value/1024/1024 from v$sesstat where statistic# =
      (select statistic# from v$statname where name = 'temp space allocated (bytes)') and sid = 804;

      select temp_space_allocated/1024/1024 from v$active_session_history where session_id = 804
      and sample_time > sysdate-1/(24*60*60);

      select used_blocks*8192/1024/1024 from v$sort_segment;

      So, in my simple test at least, the numbers from ASH, v$sort_segment, and v$sesstat do line up at 121m. Are you saying that they’re all inaccurate?

      Thanks again, and happy new year!


  • Hi Marc,

    thanks for taking up my comments and Happy New Year!

    As I already tried to explain, I don’t think that the ASH TEMP and PGA information can be broken down on SQL execution plan line information, since it seems to reflect the overall PGA / TEMP allocation of the process / session at the time of sampling across all workareas active, so I still believe that your example query showing PGA/TEMP on execution plan line level is a bit misleading.

    Regarding the accuracy: I still have to come up with good, reproducible examples, but I regularly run my XPLAN_ASH tool on real-life client queries and I can see quite often that the ASH TEMP information does not correspond to the real TEMP allocation as shown in Real-Time SQL Monitoring.

    One particular issue seems to come from BUFFERED operations that are only there in case of Parallel Execution (like HASH JOIN BUFFERED or additional BUFFER SORTs specific to PX), so may be serial executions are OK, and only Parallel Executions using BUFFERED operations might be affected by the problem.

    I haven’t had the time yet to investigate further, but I’ve already witnessed numerous inconsistencies.


  • Hi Marc,

    while working on a completely different matter I came again across this PGA/TEMP ASH stuff, and I have to correct my previous statement, because I mixed things up.

    The accuraccy of the PGA/TEMP figures in the ASH data seems to be fine.

    The problem I’ve stumbled upon regarding PGA/TEMP is the fact that I tried to come up with a PGA/TEMP usage figure over time, and due to the sampling this figure for Parallel Execution isn’t that consistent, as not always every PX Slave involved gets sampled every time, so a sum(PGA/TEMP) per sample time jumps a bit up and down, depending on which PX Slaves were sampled or not.

    This doesn’t mean however that the PGA/TEMP figures shown in the existing samples are incorrect/unreliable, so my previous statement about the reliability needs to be corrected.

    However, and that is what triggered me to mention this unreliability, the delta figures regarding I/O, that are also part of the ASH data since 11.2, are actually sometimes quite incorrect and fall short of the actual number of I/Os and volume of I/O performed.

    This I can show quite easily and if I find the time I will blog about it. The same inconsistency can be seen in the Real-Time SQL Monitoring report (“I/O requests” / “I/O Throughput” in the “Metrics” tab), which uses all these figures to populate the “Metrics” tab in the “EM/Active” version of the report.

    I’ve mixed these two problems, and thought that the PGA/TEMP figures are also unreliable, but that doesn’t seem to be the case – the delta I/O figures seem to be the ones that are often unreliable.


Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>