Is the migration database slower?

My customer set up a database in a virtual environment to support a migration. Initial tests show that the migration database is many times slower than the production database. The tests focus on a particular query.

Here is an excerpt from an AWR of the migration database. You can see the relevant data from the sql command that you should investigate as part of the list “SQL ordered by elapsed time”.

apsed Time (s)

Executions

Elapsed Time per Exec (s)

%Total

%CPU

%IO

199.24

1

199.24

98.16

3.93

96.72

For comparison, the same data from the production database:

Elapsed Time (s)

Executions

Elapsed Time per Exec (s)

%Total

%CPU

%IO

11.02

1

11.02

65.95

99.98

0.0

What makes you curious? What working hypothesis would you choose to start your investigation and what would you check? Hint: This sql statement we investigate is a count which scans only one table. The execution plan is identical in both cases, it is in each case a full table scan.

Advertisements

12 comments

  1. Hi Lothar,
    sounds similar to an issue of my client a few month ago.

    > What makes you curious?
    The main thing is the drastic “% increase” on I/O time for the migrated database. Depending on the migration method you may would expect that the table gets smaller and that the FTS would be faster.

    > This sql statement we investigate is a count which scans only one table.
    Ok, so we can ignore any execution plan changes, etc. I gonna ignore some special cases like NULL predicates as well as the statement already ran 11.02 secs in production.

    Now let’s do some hypothesis:

    1) As the database version is not mentioned the client may run into the “db file scattered read” vs. “direct path read” issue and by-passes the buffer cache now. This can happen in case of database upgrades or by a specific migration scenario (+ different cache settings) when the threshold has been crossed.

    2) The database was not using direct I/O in production and uses the file system cache. The migrated database uses direct I/O now and the multiblock reads are not catched by OS file system cache (by the way this was the issue that my client hit).

    3) I just mention this option for completeness, but i do not believe that this is happening here as the runtime increase is too drastic. Maybe the extent allocation or multiblock read count has changed after migration and so less blocks can be fetched by each call. However as direct path reads are async (I/O slots) nowadays i do not believe that this is the case here (11 secs to 199 secs).

    All hypothesis are based on that the underlying infrastructure (storage, etc.) has not changed of course 🙂

    Regards
    Stefan

    Like

    1. The first thing to check is to evaluate the number of logical IOs and physical IOs for this statement. It should be evaluated as well the size in blocks of the table in migration database and production database (which probably is different)

      Helmut

      Like

    2. Hi Stefan, correct there are no execution plan changes. But still scattered reads are reads and there are not any reads, are there not? Yes, I know what you mean and you are right, but it is a bit trickly.

      Like

  2. I saw one of these a few days ago (same execution plan, drastically different performance). Applying an Oracle patch to the slow system put everything right. We had considered whether they had a “holey data” problem with recurring direct path loads and a number of other theories. We didn’t hypothesize about any of this however, until we had the 10046 data in hand, which showed us that the significant difference was a single FETCH call. The story would have been much different if it had been a PARSE or an EXEC. You won’t be able to know that from an AWR report, will you?

    Liked by 1 person

  3. FTS in the execution plan doesn’t necessarily mean that it had to do read calls to the OS. If the blocks were already present in the buffer cache , the query would be satisfied from the buffer cache, using CPU and no I/O waits.

    Liked by 1 person

  4. A variation of Stefans 1) could if the table is in buffer cache in the production system, but not in the VM. But as you might have run the statement twice, itÄs very unlikely.
    So let me add the consequence if Stefans assumptions about a comparable infrastructure is not true:
    CPU is slightly “slower”: on Prod it takes ~11 sec CPU, but on VM it’s close to 16 sec (4 % of 200 sec) – If CPUs are the same in Prod & VM, this might be just the VM overhead. It would be very fine to get the count of IOs so we can estimate the time per physical IO. If it’s in some “reasonable” range it’s just a kind of caching issue, but if the numbers are “insane”, I’d focus on the VM setup. Probably it’s overprovisioned somewhere.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s