activity

Solution: Parallel Activity just drops

As you remember we had a hash join outer with dominant null values. Thus, the parallel process dealing with the null values got most of the work to do.
How would we deal with it? Let’s us back up a bit.
A null value is a special case. We are comparing in our join a null value versus a primary key that is defined not null. The null values will certainly not generate a match when joining. Therefore any of the parallel slaves could do the compare, as long as it is guaranteed not to generate match.
It is therefore beneficial to convert the null values into some other value, as long as we to fulfill two prerequisites:

  1. we have to convert the null values in such way that the get evenly distributed across all parallel slaves
  2. we must make sure that the converted values do not generate a match

It took me a long time to figure this. First I tried something like negative values, but they were still going to the same slave.
The issue is that our newly generated key needs to be roughly in range with the pk of the build table to be spread across all slaves.
I could have tried to figure the range in a with clause reading the data dictionary, but could not get myself to do something I considered uncool.
Instead I tried something else, which is honestly a bit of a hack, but it looks way better. I used the pk of the outer table to generate a not null widespread key.
I also took advantage of the fact that the id’s were integer and changed the join condition to:

FROM T_TAB1 T1
LEFT OUTER JOIN T_TAB2 T2
ON ( NVL(T2.X_ID, T2_id +0.5)    = T1.X_ID)

That is certainly somewhat tricky. The only thing I can say in favor is that is kind of worked.

After I applied my solution the activity tab looked like that:

Compare this to the previous picture. It is quite impressive.
Well, in version 12 the database should be able to deal with skew join distribution. But it still does not work for all cases of the as Randolf Geist states.
If someone does not like my solution he might have a look into _fix_control 6808773, but I do not warrant either solution. 😉

 

Advertisements

Parallel Activity just drops

In one of my DWH POCs for Oracle’s Real World Performance Group we did see an unexpected drop of activity. The customer wanted to know why it happened.
This is how sql monitor’s activity tab looked:

I investigated carefully and found out that the reason was in both cases an outer join that looked liked that

FROM T_TAB1 T1
LEFT OUTER JOIN T_TAB2 T2
ON T2.X_ID    = T1.X_ID

I found that 90% of all values in T1.X_ID were Null values. The distribution for the join was hash. Thus, the parallel slave dealing with null values got most of the work and was for quite some time the only one running. Table T1 was the biggest table, to broadcast it was out of question.
(You may also compare Randolf’s article on skew parallel distribution: Parallel Execution Skew – Demonstrating Skew).

Can you now re-code the join condition such that the join is done by all parallel slaves? (This was version 11G. In version 12 the optimizer should take care of it.)  Further assumption: table T2 has a one column primary key call T2_id.