When Parallel is Wrong

February 1, 2010 Kellyn Gorman

The Lead Developer and I have been working very, very hard on tuning the main process that produces much of the revenue for our company. The tuning exercise has been very successful but we’ve run into some challenging on the pre-processing that occurs for each week to build the mart that the newly tuned process runs on afterwards. Due to this, I’ve been granted some time to monitor the build process a little closer than previously and although I have a lot of tuning to do, some of the largest challenges comes in the form of queries that the analysts have to execute as part of the final checks before we toggle over to the newly created mart.

These statements all have the same where clause, two different tbl_keys, (12 or 48), pulling the entire table, (each table between 90-380GB) and then ordering the data by the i_id. They all have the same parallel hints, just different tables substituted into the statements. The reason for the sort is that the data is then pulled into an application to finish the final step. The application is incapable of dealing with this type of large sorts, so they decided to do it internal to the database. The final data is used to verify that the correct data is present in the mart of the members that were fed into the primary database that the mart is based off of for the timeline.

select /*+ full(t) parallel (t 8) */ t.*
from tbl_summary t WHERE (tbl_key=12 and usr_id=0)

order by i_id ;

The above statement is going to result in 16 parallel slaves, pulling the entire data set for user class=0 vs. a specific id, for the last 12 months of data, then sort it by the usr_id. Note the first hint- FULL. Realize, as this data is across multiple partitions, we will not see a “partition list single” in the explain plan, but “px iterator” for the database to correctly identify the individuals that meet the criteria for each partition. Once it completes this step, then the challenge comes in sorting all that data and the problem with the second hint- PARALLEL.

I love parallel. I am thrilled with the performance gains I can attain when hashing large tables when accessing multiple partitions. I do not love parallel when someone decides to put a group by or order by into the statement. If any sorting has to occur on the statement, I’ve been seeing some odd behavior in about 80% of the executions. It appears that when sorting, it’s not Oracle sorting the data all at once, but sorting one parallel slave’s data set at a time. This creates an escalation in temp space usage that can become insanely difficult for Oracle to process efficiently and performance degrades immensely.

This is what occurred with the statements for the analysts.
With Parallel:
select /*+ full(t) parallel (t 8) */ t.*

from tbl_summary t WHERE (tbl_key=12 and usr_id=0)
order by i_id ;

Elapsed Time Breakdown
SQL Time (sec) 47,951.17
PL/SQL Time (sec) 0.00
Java Time (sec) 0.00

Activity By Time
Elapsed Time (sec) 47,951.17
CPU Time (sec) 9,734.81
Wait Time (sec) 38,216.35

Activity By Waits
Remaining Waits(20.9%)
User I/O Waits(58.8%)
CPU(20.3%)

I decided to query the temp usage, as I knew the above query would need to sort the ib_id and would impact the performance of the parallel execution. The first and last sql_id in the table are the queries in question with parallel, which the one owned by kpedersen is my run at it without parallel:

SID	OSUSER	PROCESS	SQL_ID	SEGTYPE	MB	TABLESPACE
571	usr1	26584	029y9k9qvgtp2	SORT	61.5	TEMP
772	usr1	26580	029y9k9qvgtp2	SORT	90	TEMP
673	usr1	26582	029y9k9qvgtp2	SORT	131.75	TEMP
626	usr1	26590	029y9k9qvgtp2	SORT	4370.25	TEMP
837	usr1	26592	029y9k9qvgtp2	SORT	41248.75	TEMP	Part 48	with parallel
767	usr1	6434	94bu54yp2ay2v	SORT	107.25	TEMP
703	usr1	6436	94bu54yp2ay2v	SORT	73.5	TEMP
568	usr1	6438	94bu54yp2ay2v	SORT	278.5	TEMP
635	usr1	6442	94bu54yp2ay2v	SORT	1248.25	TEMP
811	usr1	6440	94bu54yp2ay2v	SORT	3679.75	TEMP	Create table for Cognitive data
655	Mine	5976:4628	arz8vhqzt80y3	SORT	2584.25	TEMP	Mine Without Parallel
600	usr1	26656	bazjcx55wky5u	SORT	2517.25	TEMP
827	usr1	26658	bazjcx55wky5u	SORT	29089.5	TEMP	Part 12	with parallel

Now after seeing the waits for temp, the long operations, (441 minutes left on the sort in temp…) I decide to say enough it enough and attempt the execution without the hints- just plain ol’ query of this huge table, nothing more, (see above in the table, showing my temp usage for the single process, as “Mine”.) Fifteen minutes, yes, that’s right, fifteen minutes and I had the first rows back on my screen:

SQL Text-
select t.* from tbl_summary t
WHERE (tbl_key=12 and usr_id=0) order by i_id;

DETAILS on Without Parallel:
Elapsed Time Breakdown
SQL Time (sec) 945.25
PL/SQL Time (sec) 0.00
Java Time (sec) 0.00

Activity By Time
Elapsed Time (sec) 945.25
CPU Time (sec) 370.67
Wait Time (sec) 574.58

Activity By Waits
Remaining Waits(1.1%)
User I/O Waits(59.7%)
CPU(39.2%)

Moral:
Carefully test any parallel processing when sorting is involved. You may not receive performance benefits as expected.

8 thoughts on “When Parallel is Wrong”

Asif Momen

February 3, 2010 at 1:08 am

Nice post.

Even I have experienced performance degradation in past when PARALLEL and ORDER BY were used together.
iso775533eac

February 3, 2010 at 3:44 am

看看blog調整心情，又要來繼續工作，大家加油.........................
Randolf

March 22, 2010 at 6:28 am

Kellyn,

just an idea: I think in another post you've mentioned that you might have some code that uses manual PGA settings (workarea_size_policy = manual).

What I could imagine is that you encounter an odd combination of the SAS bug (the SORT_AREA_SIZE parameter won't be propagated to the parallel slaves, so they might be using 65535 bytes (!, yes 64K, the default if the value is not specified in the instance parameters) and a skewed spread of the work across the parallel slaves, eventually leading to the situation that you end up with a few parallel slaves with a SORT_AREA_SIZE of 64K doing all the sorts…

Just an idea – I might be totally wrong.

I would like to get a better understanding what oddities exactly you observe with your parallel execution, this sounds interesting…

Randolf
Uwe Hesse

March 28, 2013 at 2:47 am

I suppose you tried using a Temporary Tablespace Group, giving multiple Temp tablespaces to the parallel processes? They have been there in 2010 at least http://docs.oracle.com/cd/B19306_01/server.102/b14231/tspaces.htm#sthref1197

Kind regards
Uwe
Babak Tourani

June 3, 2014 at 12:51 am

Hi,

Thanks for the post.

Two questions though:

1. If the sorting is done one parallel server at a time (serially?) why should the Temp usage increase? I understand if the performance drops (allocate/deallocate temp, loose time coordinating,…) but wouldn’t the temp space used by one parallel server be reused by the next?

2. (a more silly question!): Would it help if one rewrites this [or perhaps the similar ones] query as a “select-order by” on an inline view which is running in parallel? Is it possible in a way that query is not re-written by CBO so that the inline view is run in full first?

Thanks
Pingback: AWR Warehouse and SYSDBA - Oracle - Oracle - Toad World

Comments are closed.

You May Also Like

The DBSNMP Account and User

GDPR Solutions in the Day of Automation

How to Apply an EBS Patch

8 thoughts on “When Parallel is Wrong”