The mystery of the lost session

The Today’s puzzle is more challenging than the previous ones. I also admit that there might still be a solution other than the one we have found in the real case.
However, I just can not think of one.
But read for yourself:
One of our customers suffered from regular unpleasant row cache lock waits. The row cache lock was of type 7, which is dc_users. I created a system state dump, such as described here: system state dumps .
It was a whole series of blocked and blocking sessions in the dump.
If I wanted to find the blocking session in the dump to find out which statement was blocking, I could not find the session.
I also managed to find the SIDs of blocking sessions in real time. But the session itself was never there.
Which statement was issued by the blocking sessions?

Solution: A function in the where condition

Of course, Martin Berger is correct once more.

  1. The function is obsolete and should be removed
  2. The function is not defined as deterministic, though it is maximal deterministic. As a consequence, one can not create a function based index, create no virtual column and no Extended Statistics.
  3. of course, it would make sense to use the result cache. In the function definition for this keyword is missing. We can not add it without changing the code.
  4. As a final comprehensive measure we could assign statistics to the function. The how to is described here Table Expressions, Cardinality, SYS_OP_ATG and KOKBF and here
    setting cardinality for pipelined table functions and code. Unfortunately this is not possible because an error in the where condition enforces an implicit conversion. Statistics assigned to function effective are not used if to_number (effective ..) which is actually called. Statistics associated with the original effective function are therefore ineffective.

Thus, the developer has destroyed all possibilities for an elegant solution due impressive incompetence and we need to fix the statement using hints. This was achieved through a sql patch

Solution: The lost time

The answer is network latency.

Many thanks to my commentators who are of course totally correct. Since the SQL commands are fast enough, the problem must be elsewhere.

The supposed idle wait events sql * net message from client should stand out in the trace. A look at the trace result confirmed the hypothesis.

At a class by Craig Shallahammer I had learned that the network with a simple ping to test the latency. Our experiments showed fast that you could explain the time difference fairly well by the formular “number of database calls * time for a Ping * 2”.  I concluded that each database results in two network transfers. I suspect that the second one is an aknowledge but I have never double checked it.

To solve the problem in the short term it was necessary to reduce the network latency. At that time, only Citrix was the only possible technical solution. In the long term, it was necessary (to save the Citrix license ) to reduce the number of database calls by combing them into joins. (How it should have been done right from start.)