Is the migration database slower?

lotharflatz Oracle Performance Riddle August 20, 2015 1 Minute

My customer set up a database in a virtual environment to support a migration. Initial tests show that the migration database is many times slower than the production database. The tests focus on a particular query.

Here is an excerpt from an AWR of the migration database. You can see the relevant data from the sql command that you should investigate as part of the list “SQL ordered by elapsed time”.

apsed Time (s)	Executions	Elapsed Time per Exec (s)	%Total	%CPU	%IO
199.24	1	199.24	98.16	3.93	96.72

For comparison, the same data from the production database:

Elapsed Time (s)	Executions	Elapsed Time per Exec (s)	%Total	%CPU	%IO
11.02	1	11.02	65.95	99.98	0.0

What makes you curious? What working hypothesis would you choose to start your investigation and what would you check? Hint: This sql statement we investigate is a count which scans only one table. The execution plan is identical in both cases, it is in each case a full table scan.

Published by lotharflatz

View all posts by lotharflatz

Published August 20, 2015

12 thoughts on “Is the migration database slower?”

Stefan Koehler says:

August 20, 2015 at 8:47 am

Hi Lothar,
sounds similar to an issue of my client a few month ago.

> What makes you curious?
The main thing is the drastic “% increase” on I/O time for the migrated database. Depending on the migration method you may would expect that the table gets smaller and that the FTS would be faster.

> This sql statement we investigate is a count which scans only one table.
Ok, so we can ignore any execution plan changes, etc. I gonna ignore some special cases like NULL predicates as well as the statement already ran 11.02 secs in production.

Now let’s do some hypothesis:

1) As the database version is not mentioned the client may run into the “db file scattered read” vs. “direct path read” issue and by-passes the buffer cache now. This can happen in case of database upgrades or by a specific migration scenario (+ different cache settings) when the threshold has been crossed.

2) The database was not using direct I/O in production and uses the file system cache. The migrated database uses direct I/O now and the multiblock reads are not catched by OS file system cache (by the way this was the issue that my client hit).

3) I just mention this option for completeness, but i do not believe that this is happening here as the runtime increase is too drastic. Maybe the extent allocation or multiblock read count has changed after migration and so less blocks can be fetched by each call. However as direct path reads are async (I/O slots) nowadays i do not believe that this is the case here (11 secs to 199 secs).

All hypothesis are based on that the underlying infrastructure (storage, etc.) has not changed of course 🙂

Regards
Stefan

LikeLike

Reply
1. Helmut Pfau says:
  
  August 20, 2015 at 9:55 am
  
  The first thing to check is to evaluate the number of logical IOs and physical IOs for this statement. It should be evaluated as well the size in blocks of the table in migration database and production database (which probably is different)
  
  Helmut
  
  LikeLike
  
  Reply
  1. lotharflatz says:
    
    September 22, 2015 at 3:19 pm
    
    Hi Helmut, thanks for actually answering my second question ;). But no the block size is identical.
    
    LikeLike
2. lotharflatz says:
  
  September 22, 2015 at 3:14 pm
  
  Hi Stefan, correct there are no execution plan changes. But still scattered reads are reads and there are not any reads, are there not? Yes, I know what you mean and you are right, but it is a bit trickly.
  
  LikeLike
  
  Reply
Cary Millsap says:

August 21, 2015 at 9:56 pm

I saw one of these a few days ago (same execution plan, drastically different performance). Applying an Oracle patch to the slow system put everything right. We had considered whether they had a “holey data” problem with recurring direct path loads and a number of other theories. We didn’t hypothesize about any of this however, until we had the 10046 data in hand, which showed us that the significant difference was a single FETCH call. The story would have been much different if it had been a PARSE or an EXEC. You won’t be able to know that from an AWR report, will you?

LikeLiked by 1 person

Reply
1. lotharflatz says:
  
  September 22, 2015 at 3:17 pm
  
  Hi Cary, it is no bug or similar. It is indented behavior.
  
  LikeLike
  
  Reply
Hemant K Chitale says:

August 28, 2015 at 4:17 pm

FTS in the execution plan doesn’t necessarily mean that it had to do read calls to the OS. If the blocks were already present in the buffer cache , the query would be satisfied from the buffer cache, using CPU and no I/O waits.

LikeLiked by 1 person

Reply
1. lotharflatz says:
  
  September 22, 2015 at 3:09 pm
  
  under which circumstance will that happen? How do you test it?
  
  LikeLike
  
  Reply
martinberx says:

September 3, 2015 at 1:20 pm

A variation of Stefans 1) could if the table is in buffer cache in the production system, but not in the VM. But as you might have run the statement twice, itÄs very unlikely.
So let me add the consequence if Stefans assumptions about a comparable infrastructure is not true:
CPU is slightly “slower”: on Prod it takes ~11 sec CPU, but on VM it’s close to 16 sec (4 % of 200 sec) – If CPUs are the same in Prod & VM, this might be just the VM overhead. It would be very fine to get the count of IOs so we can estimate the time per physical IO. If it’s in some “reasonable” range it’s just a kind of caching issue, but if the numbers are “insane”, I’d focus on the VM setup. Probably it’s overprovisioned somewhere.

LikeLiked by 1 person

Reply
1. Vladimir Andreev says:
  
  September 7, 2015 at 4:09 pm
  
  4% of 200 is 8 – so the VM’s CPU appears to be quite a bit faster 🙂 That’s assuming both run the same code path, though – which is not guaranteed.
  
  LikeLike
  
  Reply
  1. lotharflatz says:
    
    September 22, 2015 at 3:03 pm
    
    It is really not guaranteed. What could be the difference?
    
    LikeLike
2. lotharflatz says:
  
  September 22, 2015 at 3:08 pm
  
  Whatever the quality of the I/Os on the VM might be, I think the issue is that we have I/Os on the VM opposed to the prod system. You are very well on target with your first sentence.
  
  LikeLike
  
  Reply