In one of my DWH POCs for Oracle’s Real World Performance Group we did see an unexpected drop of activity. The customer wanted to know why it happened.
This is how sql monitor’s activity tab looked:
I investigated carefully and found out that the reason was in both cases an outer join that looked liked that
FROM T_TAB1 T1 LEFT OUTER JOIN T_TAB2 T2 ON T2.X_ID = T1.X_ID
I found that 90% of all values in T1.X_ID were Null values. The distribution for the join was hash. Thus, the parallel slave dealing with null values got most of the work and was for quite some time the only one running. Table T1 was the biggest table, to broadcast it was out of question.
(You may also compare Randolf’s article on skew parallel distribution: Parallel Execution Skew – Demonstrating Skew).
Can you now re-code the join condition such that the join is done by all parallel slaves? (This was version 11G. In version 12 the optimizer should take care of it.) Further assumption: table T2 has a one column primary key call T2_id.