Exadata Can’t Fix Your Temp Problem

Post a recent migration to consolidate a number of new databases to one of my client’s Exadatas, I found myself staring at a familiar wait event on one of them that I run into way too often when hopes of Exadata migration performance glory is reviewed.  Where, with all the features that are available to the consolidated database to perform more effectively on Exadata, Temp usage just isn’t one of them.

Many know that I’m a proponent of ensuring design and code is as efficient as possible to not “over run” PGA allocation per session and “swap” over to temp.  Temp is on disk and disk is slow.  The option to perform smart scans of large tables is incredibly beneficial on Exadata, but if you are trying to perform complex sorts, especially on hash joins, performance challenges can arise as time is spent on temp reads and writes.

The database in question demonstrated more than acceptable offload percentages for a simple consolidation and increases in performance were around 5X in it’s “As Is” status.  When I say “As Is”, this means we had just started to review the database performance and as this was a lower usage production database, no formal project to test performance before the migration had been performed.  I did have AWR data to review and compare from the pre-migration, but no other steps were done.

The initial review of the newly consolidated database to the Exadata showed that the main load process was consuming considerable temp and this was something that was evident in the pre-migration reports-  nothing new, just nothing that Exadata could assist with unless a change was made to work with the Exadata features.

Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
DB CPU 4,314 42.0
direct path read temp 545,690 3,389 6 33.0 User I/O
direct path write temp 156,464 1,296 8 12.6 User I/O
 %Time Total Wait wait Waits % DB
Event Waits -outs Time (s) (ms) /txn time
-------------------------- ------------ ----- ---------- ------- -------- ------
direct path read temp 545,690 0 3,389 6 563.7 33.0
direct path write temp 156,464 0 1,296 8 161.6 12.6
direct path read 75,541 0 806 11 78.0 7.8


Considering the Waits on Temp are sitting at about 42% of the total transaction time, we can see that this is considerable, dwarfing all other wait events.

The transactions in question were inserts performed by their main load system.  It used basic DML to process records and batch load into the database in question.  The following offers you a view of the times, etc.

Elapsed Elapsed Time
 Time (s) Executions per Exec (s) %Total %CPU %IO SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
 4,519.6 0 N/A 44.0 43.7 57.3 3kb7qnffktkie
Module: JDBC Thin Client
insert into TABLE1...
SELECT <columns>
SELECT (lots of columns and sorting and grouping from SOURCE_TBL3 and SOURCE_TBL4)
2,618.3 0 N/A 25.5 51.8 46.4 98wvn8s6wudtu
Module: JDBC Thin Client
insert into TABLE2...same as above with minimal dif
1,356.3 1 1,356.27 13.2 34.2 65.3 jmsmi84uuf3p0
Module: JDBC Thin Client
insert into TABLE3...same as above with minimal dif
1,208.0 1 1,207.96 11.8 24.1 76.0 2jv8n0jtklt8k
Module: JDBC Thin Client
insert into TABLE4...same as above with minimal dif

Yes, four statements were causing all this temp usage.  The selects on the waits were performing complex processing as part of the sub-select before the insert.  The tables used in the hash join were properly smart scanned and returned quickly, but then as the sort occurred post the hash, the elapsed time was extensive and temp usage was  considerable, (between 85-120G each).

The solution was to introduce a roll-up table as part of the load process that each of the four selects could utilize as a “step through” so less processing occurred as part of the sub-select that were part of the queries.

insert into TABLE1
select <columns, case statements, sums, Oh My!>
(Due to this, the second subselect wasn't required any longer as this was already processed as part of the NEW_ROLLUP_TBL) 

This decreased the temp reads and increased offloading, (smart scans) as the NEW_ROLLUP_TBL offered a double benefit-  less processing that had to be done at the PGA level, (spilling over to temp, which was slow) and as it could be offloaded, increased in offload percentage.


Event Waits Time(s) Avg wait (ms) % DB time Wait Class
DB CPU 50,812 50.62
cell multiblock physical read 5,879 288 3 7.22 User I/O
direct path read temp 3,081 239 4 6.38 User I/O






Print Friendly, PDF & Email

Author: dbakevlar



Comments Closed