Using Runtime Statistics

How would you optimize the statement that produced the following runtime statistics? You can do anything to make it faster without changeing the statement. In particular you can change or add an index.


| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads     |
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:15.42 |    3540 |      3262 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:15.42 |    3540 |      3262 |
|*  2 |   TABLE ACCESS BY INDEX ROWID|   T1 |      1 |  52695 |     10 |00:00:15.42 |    3540 |      3262 |
|*  3 |    INDEX RANGE SCAN          |   I1 |      1 |     17M|   4577 |00:00:00.16 |      11 |         9 |

Predicate Information (identified by operation id):

   2 - filter(("NVL("X1_STATUS",'A')<>'U' AND NVL("X2_STATUS",'A')<>'D' 
               AND NVL("X3_STATUS",'A')<>'D' AND NVL("X4_STATUS",'A')<>'D' 
               AND NVL("X5_STATUS",'A')<>'U'))
   3 - access("T1_ID"=:B2 AND "T2_ID"=:B4 AND "T3_ID"=:B1)



  1. Hi Lothar,
    your tweet sounds like a trick question. :-)) There are several eye-catching points at the first look.

    1) Why are the CBO estimates so way off (e.g. 17M vs. 4577 for access predicates)? Are there other existing structures that would fit better, if the estimates are in the right ball park?

    2) Where are the filter predicates NVL(,’A’) != ” coming from? They don’t make sense at all, because it is always != ”, even if a NULL value is returned.

    3) Most of the time is spent on table access, but only 10 rows (out of 4577) are returned to the sort aggregate. So expanding the index with the X_STATUS columns can reduce the amount of physical / logical reads and speed up the query.

    Maybe there are already supporting structures (for point 3), but can not be used due to point 1 and 2.


    Liked by 1 person

    1. Actually Stefan it is not a trick question. I just wanted to tweet a bit. 😉
      There are no other useful indexes . One hint for you why the estimates on the index access are off: The conditions is hierarchical foreign key, You still can’t name the reason for sure now, but there is sone likelihood…


  2. Lothar, Stefan,

    For the “wrong” estimates, it looks as if there are correlations between T1_ID, T2_ID, T3_ID.
    To give the optimizer some knowledge about this fact, I’d suggest using extended statistics for these 3 columns.

    But that does not helps to “optimize the statement” as the original question was.
    As “optimize” is not specified in more detail, I translate it into “change [it] to run in less time”.
    [it] is the statement, so let’s guess how the statement might look like:
    FROM T1
    WHERE “T1_ID”=:B2 AND “T2_ID”=:B4 AND “T3_ID”=:B1
    AND ( NVL(“X1_STATUS”,’A’)’U’
    AND NVL(“X2_STATUS”,’A’)’D’
    AND NVL(“X3_STATUS”,’A’)’D’
    AND NVL(“X4_STATUS”,’A’)’D’
    AND NVL(“X5_STATUS”,’A’)’U’)

    We know there is an index I1 on T1 (T1_ID, T2_ID, T3_ID)
    (in any order).
    The index uses 11 blocks to store (slightly more than) 4577 leafs. For a 8k block it’s quite well packed.
    I don’t see anything in the statement we could to to improve this index.
    The Index access also only provides 1% of the total time.

    The table access shows 3262 Reads for 4577 rows in ~15 sec. So at about 4.5 ms / Read. Without any better knowledge I’d call this a reasonable RAM-buffered storage. (to slow for pure memory-storage, but to fast for pure disk)

    As most of the ROWIDs from the Index points to different data blocks, we know the data in T1 is NOT ordered by the columns of I1. (OR there is really much free space OR the avg.rowlenght of T1 is 3/4 the size of 1 block)
    But still there is nothing to optimize the statement.

    Lothar told us, Stefans question 2) is of no help, as there are no other useful indexes. I don’t expect the CPU time used for each “NVL(” being the main time consumption.

    As we have no supporting structures on X._STATUS, I don’t see any other chance to make the query “faster” but use a PARALLEL hint (if licensed). Even with only 15 sec overall time it will spread the sequential reads to several workers, if my assumption about the storage is right, buffers (and hopefully internal read-ahead) will deliver the same 4.5 ms for several workers at the same time.

    But there is more we can guess about the statement:
    Based on the column names I’d say it’s a kind of workflow instance tracking table:
    T1_ID, T2_ID and T3_ID are references which describe the type of workflow (grandparent, parent, child) and X._STATUS reflects different generic states of the workflow.
    So the Query will be something like “give me the count of all workflows if type A in specific state B”.

    To make the whole setup (not the statement) more performant, you can partition T1 by one of the X._STATUS columns – maybe one describes the “successful end” of the workflow. Given >99% of the workflows in T1 are finished successful, and most of the time the query asks for “workflows in any progress state” this could help a lot. Of course there are things to consider as row movements, local/global index and others. This idea follows the same basic pattern as Stefans suggestion 3).

    An interesting idea might be the usage of bitmap indexes for X._STATUS columns, but they are dangerous in OLTP like environments, as it might be here.

    Also Index Organized Tables might come as a suggestion; if created on (T1_ID, T2_ID, T3_ID). But I fear the main access pattern for T1 is the workflow-instance-ID, not the surrogate key for the workflow type?

    As a summary I don’t see reasonable chance in the query itself to optimize, but various methods to do so in changing the physical design.


    Liked by 1 person

  3. Hi Lothar,

    in addition to what Stefan and Martin said, if creating indexes and rewriting the query is allowed, maybe a solution to decrease the number of logical reads in plan’s line 2 could be to:

    If the combined predicates on Xn_STATUS columns which appear in the filter operation identify a relatively small percentage of rows in the whole table (from the A-Rows in line 2 it seems it does, though this might be only for the specific combination of [T1_ID, T2_ID, T3_ID], so it depends on the actual data), maybe a better approach than appending the Xn_STATUS columns to the I1 index in terms of index maintenance overhead, is to selectively index only the rows which satisfy this criteria, e.g.:

    create index I2 on T1(
    case when NVL(X1_STATUS, ‘A’) ‘U’
    and NVL(X2_STATUS, ‘A’) ‘D’
    and NVL(X3_STATUS, ‘A’) ‘D’
    and NVL(X4_STATUS, ‘A’) ‘D’
    and NVL(X5_STATUS, ‘A’) ‘U’ then 0
    else null

    Then the CBO could use both indexes to access the data and also avoid visiting the table altogether, if the aggregate function used doesn’t reference some column which isn’t indexed. On my model dataset I had to force this with an index_combine hint:

    select /*+gather_plan_statistics index_combine(t1 i1 i2)*/ count(*)
    from t1
    where t1_id = 10
    and t2_id = 11
    and t3_id = 12
    and case when NVL(X1_STATUS, ‘A’) ‘U’
    and NVL(X2_STATUS, ‘A’) ‘D’
    and NVL(X3_STATUS, ‘A’) ‘D’
    and NVL(X4_STATUS, ‘A’) ‘D’
    and NVL(X5_STATUS, ‘A’) ‘U’ then 0
    else null
    end = 0

    The filter operation isn’t there anymore, only access operations (sorry for not formatting as fixed-width):

    | Id | Operation | Name | Starts | E-Rows | A-Rows |
    | 0 | SELECT STATEMENT | | 1 | | 1 |
    | 1 | SORT AGGREGATE | | 1 | 1 | 1 |
    | 2 | BITMAP CONVERSION COUNT | | 1 | 1 | 1 |
    | 3 | BITMAP AND | | 1 | | 1 |
    | 4 | BITMAP CONVERSION FROM ROWIDS| | 1 | | 1 |
    |* 5 | INDEX RANGE SCAN | I1 | 1 | | 4577 |
    | 6 | BITMAP CONVERSION FROM ROWIDS| | 1 | | 1 |
    |* 7 | INDEX RANGE SCAN | I2 | 1 | | 4577 |

    Predicate Information (identified by operation id):

    5 – access(“T1_ID”=10 AND “T2_ID”=11 AND “T3_ID”=12)
    7 – access(“T1”.”SYS_NC00011$”=0)

    2) An alternative is to append the CASE expression to the existing I1 index, but we should check what does that mean in terms of index size increase and consider if the approach mentioned above is better:

    create index I1 on T1(
    case when NVL(X1_STATUS, ‘A’) ‘U’
    and NVL(X2_STATUS, ‘A’) ‘D’
    and NVL(X3_STATUS, ‘A’) ‘D’
    and NVL(X4_STATUS, ‘A’) ‘D’
    and NVL(X5_STATUS, ‘A’) ‘U’ then 0
    else null

    | Id | Operation | Name | Starts | E-Rows | A-Rows |
    | 0 | SELECT STATEMENT | | 1 | | 1 |
    | 1 | SORT AGGREGATE | | 1 | 1 | 1 |
    |* 2 | INDEX RANGE SCAN| I1 | 1 | 1 | 4577 |

    Predicate Information (identified by operation id):

    2 – access(“T1_ID”=10 AND “T2_ID”=11 AND “T3_ID”=12 AND

    Again, a precise access operation is performed instead of an inefficient filter.

    In my model dataset the second approach performed less logical reads, but again, that depends on the dataset you have. Also, creating such indexes and rewriting the SQL would probably be only sensible if this is a frequently executed SQL where the quick response time outweighs the additional index maintenance we incur when adding new indexes/add columns to existing ones.

    Jure Bratina

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s