Solution: Where did the downgrades come from ?

As Stefan already correctly ponted out, the downgrades did origin from parallel sessions that Toad issued. Under special circumstance, the invoking client process has long ceased to exist, but the database sessions stays. As a result, we came under increasing pressure as the number of these sessions grew slowly. There were fewer and fewer new px be allocated because the pool is not infinite.
As a measure, however, we have a profile is activated, that kills inactive session after a certain time.
The second reason was that the parameter parallel_adaptive_multi_user was set to true. This ensured that about a half of the parallel session pools could be used for parallel processing. The parameter was obviously absurd in a data warehouse. However, the local DBA’s insisted not to change the default. For the first time I was faced with the absurd idea that a default is something of a taboo. Instead,they set the parameter was parallel_threads_per_cpu triumphantly to the default of 2. However, the OS had reported all threads as CPUs, so now twice as many CPUs were reported as really existed, as twice as many threads.
As for the downgrades three meaningless values ​​in the parameters have happy neutralized each other. Why, however, the Resource Manager was still running, I honestly do not get. 😉

Where did the downgrades come from?

In a data warehousewe we faced a increasing number of downgrades of parallelism. It seemed to be getting worse by the day. Indeed, it turned out that the number of free available parallel processes showed a constant decrease. A closer examination of PX used showed that many had Toad as source.
How was it possible that the number of parallel sessions that originated from Toad grew constantly?
On the other hand, there was still a comfortable number of PX is available. How was it possible that we were still seeing downgrades? What parameters should we check out?

Solution: Navigate many shallow hierachies in parallel

Lets create the indexes we need. There might be better options, but these will do:


create index i1 on tree_emp( empno);
create index i2 on tree_emp( mgr);

Now we define the pipe table package. Of course it is possible to have a more efficient solution, but I want to show the principle.


create or replace package parallel_access
as

TYPE R_REC IS RECORD (empno tree_emp.empno%TYPE,
sum_salaries Number); — result record, change defintion according to your needs
TYPE refcur_t IS REF CURSOR RETURN R_REC;

TYPE result_Tab is TABLE OF R_REC;

FUNCTION passData (p_ref refcur_t) RETURN result_Tab
PIPELINED
PARALLEL_ENABLE(PARTITION p_ref BY ANY); — function will inherit parallelism from ref cursor

END parallel_access;
/

create or replace package body parallel_access
as
FUNCTION passData (p_ref refcur_t) RETURN result_Tab
PIPELINED PARALLEL_ENABLE(PARTITION p_ref BY ANY)
IS
out_rec r_rec;
BEGIN
execute immediate ‘alter session set “_old_connect_by_enabled”=true’;
LOOP — for each parallel process
FETCH p_ref INTO out_rec;
EXIT WHEN p_ref%NOTFOUND;
SELECT sum(sal)
INTO out_rec.sum_salaries
FROM tree_emp
CONNECT BY PRIOR EMPNO = MGR
START WITH mgr = out_rec.empno;

PIPE ROW(out_rec);
END LOOP;
execute immediate ‘alter session set “_old_connect_by_enabled”=false’;
RETURN;
END passData;

END parallel_access;
/

We can use the package as follows. Parallel 4 is just an example, one can choose the parallelism as needed.

SELECT b.* FROM
TABLE(parallel_access.passdata (CURSOR( select /*+ parallel (d 4) */ empno , null from tree_emp where mgr is null))) b;

I tested the example with parallel 16 on my laptop (4 core CPU). The piped table was about twice as fast as the standard plan.

Navigate many shallow hierachies in parallel

Suppose you have a large number of flat hierarchies. As an example we take the emp table from the Scott schema.
Using it is we create a table Tree_emp as shown here:


create table tree_emp as select * from emp;

alter table tree_emp modify (empno number(18));
alter table tree_emp modify (mgr number(18));

INSERT INTO tree_emp
( EMPNO, ENAME, JOB, MGR, HIREDATE, SAL, COMM, DEPTNO
)
;
SELECT EMPNO+(10000*step.n) empno,
ENAME,
JOB,
MGR+(10000*step.n) mgr,
HIREDATE,
TRUNC(DBMS_RANDOM.value(low => 8, high => 50))*100 SAL,
COMM,
DEPTNO
FROM emp,
(SELECT rownum n FROM dual CONNECT BY rownum <= 1000000
) step
;

Show for all presidents (mgr is zero) the sum of the salaries of all subordinate . Write a pipelined table function that can navigate multiple trees simultaneously.
Use Oracle’s connect by Syntax.You can index as you like.

Solution: Using Runtime Statistics

In my practise runtime statistics are the most important optimization tool for non parallel queries.
Runtime statistics can be generated quickly and usually contain already all the necessary information for a optimization.
First I repeat the runtime statistics of the riddle:

 

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:15.42 |    3540 |      3262 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:15.42 |    3540 |      3262 |
|*  2 |   TABLE ACCESS BY INDEX ROWID|   T1 |      1 |  52695 |     10 |00:00:15.42 |    3540 |      3262 |
|*  3 |    INDEX RANGE SCAN          |   I1 |      1 |     17M|   4577 |00:00:00.16 |      11 |         9 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("NVL("X1_STATUS",'A')<>'U' AND NVL("X2_STATUS",'A')<>'D' 
               AND NVL("X3_STATUS",'A')<>'D' AND NVL("X4_STATUS",'A')<>'D' 
               AND NVL("X5_STATUS",'A')<>'U'))
   3 - access("T1_ID"=:B2 AND "T2_ID"=:B4 AND "T3_ID"=:B1)

It is initially striking again that the estimated rows in operation 3 are way off from the actual rows. It becomes obvious when we compare the “estimated rows” (E-rows) with the “actual rows” (A-Rows). Since it is a very simple compound condition with a “=” comparision operator we can to assume a functional dependency as a likely reason for the estimation error.

The corresponding countermeasure would be extended statistics.
However, most of the time is lost in the operation 2. Here the “actual rows” of 4577 rows in operation 3 are reduced to merely 10 row s in operation 2.. This is due to the filter condition in operation 2:

2 - filter(("NVL("X1_STATUS",'A')<>'U' AND NVL("X2_STATUS",'A')<>'D' 
               AND NVL("X3_STATUS",'A')<>'D' AND NVL("X4_STATUS",'A')<>'D' 
               AND NVL("X5_STATUS",'A')<>'U'))

Acoording to the  tuning technique Throw-away” invented by Oak Table member Martin Berg we would need to add some extra columns to the index in order to avoid the “throw away” rows in oepration 2.

I have found very clever recommendations in my comments. However, you do not have to be very sofisticated in this case. We have tested the status fields and found which increase the selectivity of the index. There status fields we simply add to the index. The application the developer wrongly assumed the application would not benefit from the extra fields, because in a non-equal condition, no index can be used.

This is not correct. The tree structure can not be used. However, the tree structure is only needed to establish the first leaf block for an index range scan. From there onwards a serial is over the leaf blocks will be done.

In this serial search the datebase can use any condition independend of the comparison operators to filter rows. As we can see the statistics ïn below runtime statistics, that’s enough to ensure a good performance. The improvement factor is run 700.

Not even the improvement of statistics is ultimately required for this case. Although the optimizer estimates wrong it still perceives the additional columns as an advantage.

Of course, the extended statistics would still be worthwhile.

 

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                     |      1 |        |      1 |00:00:00.02 |   12 |         4 |
|   1 |  SORT AGGREGATE              |                     |      1 |      1 |      1 |00:00:00.02 |   12 |         4 |
|*  2 |   TABLE ACCESS BY INDEX ROWID| MM_SALES_DELIVERIES |      1 |  52695 |     10 |00:00:00.02 |   12 |         4 |
|*  3 |    INDEX RANGE SCAN          | PR_SALE_DEL_03      |      1 |  52695 |     10 |00:00:00.01 |    6 |         0 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter((NVL("MM_DISTRIBUTION_DEL_STATUS",'H')<>'D' AND NVL("MM_OUTBOUND_DEL_STATUS",'H')<>'D' AND
              NVL("MM_OUTBOUND_STATUS",'H')<>'U'))
   3 - access("MM_WAREHOUSE_ID"=:B4 AND "MM_FIRM_ID"=:B1 AND "MM_ITEM_ID"=:B2 AND "MM_FROM_LOCATION_ID"=:B3)
       filter((NVL("MM_DISTRIBUTION_STATUS",'H')<>'U' AND NVL("MM_DIRECT_DEL_STATUS",'H')<>'D'))

Using Runtime Statistics

How would you optimize the statement that produced the following runtime statistics? You can do anything to make it faster without changeing the statement. In particular you can change or add an index.

 

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:15.42 |    3540 |      3262 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:15.42 |    3540 |      3262 |
|*  2 |   TABLE ACCESS BY INDEX ROWID|   T1 |      1 |  52695 |     10 |00:00:15.42 |    3540 |      3262 |
|*  3 |    INDEX RANGE SCAN          |   I1 |      1 |     17M|   4577 |00:00:00.16 |      11 |         9 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("NVL("X1_STATUS",'A')<>'U' AND NVL("X2_STATUS",'A')<>'D' 
               AND NVL("X3_STATUS",'A')<>'D' AND NVL("X4_STATUS",'A')<>'D' 
               AND NVL("X5_STATUS",'A')<>'U'))
   3 - access("T1_ID"=:B2 AND "T2_ID"=:B4 AND "T3_ID"=:B1)

Solution: Mystery of the lost session

As speculated in Twitter the command was connect and the activity was logon. After some research the root cause could be found : The report server attempted logon using invalid credentials.

After we had corrected the credentials, the row locks disappeared.

The first good indication of the solution, I got the blog of Dominic Brooks: Failed Logins can cause row cache lock on dc_users. Martin Preiss, too, had found this website when searching for the solution. Stefan Koehler has documented a very similar case in great detail: Wait event “library cache lock” by (security) design and exploring system state dumps (oradebug). Franck Pachot mentioned on Twitter that the “connection management call elapsed time” event in AWR should have been a hint, had I only noticed… Of course, Martin Berger was as always on the right track.

The mystery of the lost session

The Today’s puzzle is more challenging than the previous ones. I also admit that there might still be a solution other than the one we have found in the real case.
However, I just can not think of one.
But read for yourself:
One of our customers suffered from regular unpleasant row cache lock waits. The row cache lock was of type 7, which is dc_users. I created a system state dump, such as described here: system state dumps .
It was a whole series of blocked and blocking sessions in the dump.
If I wanted to find the blocking session in the dump to find out which statement was blocking, I could not find the session.
I also managed to find the SIDs of blocking sessions in real time. But the session itself was never there.
Which statement was issued by the blocking sessions?

Solution: A function in the where condition

Of course, Martin Berger is correct once more.

  1. The function is obsolete and should be removed
  2. The function is not defined as deterministic, though it is maximal deterministic. As a consequence, one can not create a function based index, create no virtual column and no Extended Statistics.
  3. of course, it would make sense to use the result cache. In the function definition for this keyword is missing. We can not add it without changing the code.
  4. As a final comprehensive measure we could assign statistics to the function. The how to is described here Table Expressions, Cardinality, SYS_OP_ATG and KOKBF and here
    setting cardinality for pipelined table functions and code. Unfortunately this is not possible because an error in the where condition enforces an implicit conversion. Statistics assigned to function effective are not used if to_number (effective ..) which is actually called. Statistics associated with the original effective function are therefore ineffective.

Thus, the developer has destroyed all possibilities for an elegant solution due impressive incompetence and we need to fix the statement using hints. This was achieved through a sql patch