Using a Function-Based Index to “Avoid” a Sort

Aug 22, 2007 / By Alex Fatkulin

Tags: , , ,

Just a small observation that resulted from a real-world case.

Recently, I was asked to take a look at a query processing part of a client’s application which had started to perform slowly after the application had a maintenance downtime. The idea behind the processing was quite simple. Requests were inserted into a table:

SQL> create table requests
  2  (
  3   subs_id  number,
  4   insert_time date,
  5   processed number
  6  );

Table created.

Those rows were dequeued by seven processes, each process working on its own set of data using the following SQL:

select *
	from requests
	where case processed when 0 then mod(subs_id, 7) else null end=:dequeue_process_number
	order by insert_time

That is, processes were dequeuing records with the processed flag set to zero (unprocessed), and data partitioning was done using a simple yet effective mod(subs_id, 7) function. The goal of that somewhat strange WHERE clause was to support a function-based index to allow for a quick dequeue:

SQL> create index fbi_requests_p
  2   on requests
  3   (case processed when 0 then mod(subs_id, 7) end);

Index created.

Since NULL keys are not stored inside a B*Tree index, that technique allows me to store only unprocessed records in the index and divide them between dequeue processes effectively.

We can now proceed with a test-case to demonstrate how dequeue part worked:

--generate test data
SQL> insert /*+ append */ into requests
  2   select trunc(dbms_random.value(0, 10000)),   --subs_id
  3     trunc(sysdate)+dbms_random.value(0, 1),  --insert_time
  4     case when rownum <= 1000 then 0 else 1 end --processed flag
  5    from dual
  6    connect by level <= 100000;

100000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(user, 'requests', cascade => true);

PL/SQL procedure successfully completed.

SQL> variable dequeue_process_number number;
SQL> exec :dequeue_process_number:=0;

PL/SQL procedure successfully completed.

SQL> set arraysize 100
SQL> set autot traceonly
SQL> select /*+ index(r fbi_requests_p) */ *
  2   from requests r
  3   where case processed when 0 then mod(subs_id, 7) end=:dequeue_process_number
  4   order by insert_time;

138 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2085609331

---------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  |
---------------------------------------------------------------
|   0 | SELECT STATEMENT             |                | 14286 |
|   1 |  SORT ORDER BY               |                | 14286 |
|   2 |   TABLE ACCESS BY INDEX ROWID| REQUESTS       | 14286 |
|*  3 |    INDEX RANGE SCAN          | FBI_REQUESTS_P |   143 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(CASE "PROCESSED" WHEN 0 THEN MOD("SUBS_ID",7) END
           =TO_NUMBER(:DEQUEUE_PROCESS_NUMBER))

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
       3022  bytes sent via SQL*Net to client
        407  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        138  rows processed

As I have already said, the problem arose after the application’s database part had a maintenance window. Since requests were enqueued by an external devices but the database processing part was shut down, that inevitably resulted in tens of millions of unprocessed requests. A substantial delay was created before the first row could be output for processing, since all dequeued rows first had to be sorted by their insert_time.

The quick solution was to modify a function-based index to include the insert_time column, thus giving the execution engine an opportunity to use an index to bypass the SORT ORDER BY step completely:

SQL> drop index FBI_REQUESTS_P;

Index dropped.

SQL> create index fbi_requests_p
  2   on requests
  3   (case processed when 0 then mod(subs_id, 7) end,
  4   case processed when 0 then insert_time end);

Index created.

SQL> exec dbms_stats.gather_table_stats(user, 'requests', cascade => true);

PL/SQL procedure successfully completed.

Now we can slightly modify our query and execute it:

SQL> select /*+ index(r fbi_requests_p) */ *
  2   from requests r
  3   where case processed when 0 then mod(subs_id, 7) end=:dequeue_process_number
  4   order by case processed when 0 then insert_time end;

138 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 564389291

---------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  |
---------------------------------------------------------------
|   0 | SELECT STATEMENT             |                | 14286 |
|   1 |  SORT ORDER BY               |                | 14286 |
|   2 |   TABLE ACCESS BY INDEX ROWID| REQUESTS       | 14286 |
|*  3 |    INDEX RANGE SCAN          | FBI_REQUESTS_P |   143 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(CASE "PROCESSED" WHEN 0 THEN MOD("SUBS_ID",7) END
              =TO_NUMBER(:DEQUEUE_PROCESS_NUMBER))

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         96  consistent gets
          0  physical reads
          0  redo size
       3022  bytes sent via SQL*Net to client
        407  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
        138  rows processed

As you can see, the SORT ORDER BY step is still there, so we didn’t archive anything besides an increase in LIO. Driven by curiosity, I dumped the sort statistics (event 10032)…

---- Sort Statistics ------------------------------
Input records                             138
Output records                            138
Total number of comparisons performed     460
  Comparisons performed by in-memory sort 460
Total amount of memory used               6144
Uses version 2 sort
Does not use asynchronous IO
---- End of Sort Statistics -----------------------

…just to make sure that the sort is actually happening and that what we see is what we see. But the problem remained unsolved. Eventually, it turned out that what we need is to modify our query in the following way:

SQL> select /*+ index(r fbi_requests_p) */ *
  2   from requests r
  3   where case processed when 0 then mod(subs_id, 7) end=:dequeue_process_number
  4   order by case processed when 0 then mod(subs_id, 7) end,
  5    case processed when 0 then insert_time end;

138 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1059845571

--------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT            |                | 14286 |
|   1 |  TABLE ACCESS BY INDEX ROWID| REQUESTS       | 14286 |
|*  2 |   INDEX RANGE SCAN          | FBI_REQUESTS_P |   143 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(CASE "PROCESSED" WHEN 0 THEN MOD("SUBS_ID",7) END
              =TO_NUMBER(:DEQUEUE_PROCESS_NUMBER))
       filter(CASE "PROCESSED" WHEN 0 THEN MOD("SUBS_ID",7) END
              =TO_NUMBER(:DEQUEUE_PROCESS_NUMBER))

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         98  consistent gets
          0  physical reads
          0  redo size
       3022  bytes sent via SQL*Net to client
        407  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        138  rows processed

The trick was to add a (redundant) CASE PROCESSED WHEN 0 THEN MOD(SUBS_ID, 7) end expression into ORDER BY clause.

Techniques such as this make me wonder how many little refinements can still be incorporated into the optimizer.

All test were performed on 10.2.0.3. Note to self: check this against 11g.

2 Responses to “Using a Function-Based Index to “Avoid” a Sort”

  • amit poddar says:

    Hi,

    Just for curiosity I tested in 11.1.0.6. In 11g it seems that we do not have to add the redundant expression into the order by clause. See below

    SQL> create table requests ( subs_id number(10), insert_time date, processed number(3));

    Table created.

    SQL> create index fbi_requests_p
    2 on requests
    3 (case processed when 0 then mod(subs_id, 7) end);

    Index created.

    SQL> insert /*+ append */ into requests
    2 select trunc(dbms_random.value(0, 10000)),
    3 trunc(sysdate)+dbms_random.value(0, 1),
    4 case when rownum commit;

    Commit complete.

    SQL> exec dbms_stats.gather_table_stats(user, ‘requests’, cascade => true);

    PL/SQL procedure successfully completed.

    SQL> select /*+ index(r fbi_requests_p) */ *
    from requests r
    where case processed when 0 then mod(subs_id, 7) end=:dequeue_process_number
    order by insert_time
    2 3 4 5
    SQL> /

    153 rows selected.

    Execution Plan
    ———————————————————-
    Plan hash value: 564389291

    ———————————————————————————————–
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ———————————————————————————————–
    | 0 | SELECT STATEMENT | | 143 | 2145 | 6 (17)| 00:00:01 |
    | 1 | SORT ORDER BY | | 143 | 2145 | 6 (17)| 00:00:01 |
    | 2 | TABLE ACCESS BY INDEX ROWID| REQUESTS | 143 | 2145 | 5 (0)| 00:00:01 |
    |* 3 | INDEX RANGE SCAN | FBI_REQUESTS_P | 143 | | 1 (0)| 00:00:01 |
    ———————————————————————————————–

    Predicate Information (identified by operation id):
    —————————————————

    3 – access(CASE “PROCESSED” WHEN 0 THEN MOD(“SUBS_ID”,7) END
    =TO_NUMBER(:DEQUEUE_PROCESS_NUMBER))

    Statistics
    ———————————————————-
    0 recursive calls
    0 db block gets
    5 consistent gets
    0 physical reads
    0 redo size
    3278 bytes sent via SQL*Net to client
    431 bytes received via SQL*Net from client
    3 SQL*Net roundtrips to/from client
    1 sorts (memory)
    0 sorts (disk)
    153 rows processed

    SQL> drop index FBI_REQUESTS_P;

    Index dropped.

    SQL> create index fbi_requests_p
    2 on requests
    3 (case processed when 0 then mod(subs_id, 7) end,
    4 case processed when 0 then insert_time end);

    Index created.

    SQL> exec dbms_stats.gather_table_stats(user, ‘requests’, cascade => true);

    PL/SQL procedure successfully completed.

    SQL> select /*+ index(r fbi_requests_p) */ *
    2 from requests r
    3 where case processed when 0 then mod(subs_id, 7) end=:dequeue_process_number
    4 order by case processed when 0 then insert_time end;

    153 rows selected.

    Execution Plan
    ———————————————————-
    Plan hash value: 1059845571

    ———————————————————————————————-
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ———————————————————————————————-
    | 0 | SELECT STATEMENT | | 143 | 2431 | 95 (0)| 00:00:02 |
    | 1 | TABLE ACCESS BY INDEX ROWID| REQUESTS | 143 | 2431 | 95 (0)| 00:00:02 |
    |* 2 | INDEX RANGE SCAN | FBI_REQUESTS_P | 143 | | 2 (0)| 00:00:01 |
    ———————————————————————————————-

    Predicate Information (identified by operation id):
    —————————————————

    2 – access(CASE “PROCESSED” WHEN 0 THEN MOD(“SUBS_ID”,7) END
    =TO_NUMBER(:DEQUEUE_PROCESS_NUMBER))

    Statistics
    ———————————————————-
    171 recursive calls
    0 db block gets
    140 consistent gets
    0 physical reads
    0 redo size
    3278 bytes sent via SQL*Net to client
    431 bytes received via SQL*Net from client
    3 SQL*Net roundtrips to/from client
    4 sorts (memory)
    0 sorts (disk)
    153 rows processed

  • Amit,

    that’s a good news indeed. And yet another demonstration why it’s always a good idea to verify things against new releases.

Leave a Reply

  • (will not be published)

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>