*****After much adieu, I’ve finally returned to this post to rewrite what concerned me and befuddled others… :)*****

I’ve worked with some great developers.  Developers who can write code in their sleep to perform the most complex functions and aggregations, (makes me quite envious, as I need to actually think how to write code… :))  Most of these developers are still tripped up by the Oracle Cost Based Optimizer, (CBO) and honestly, I think they should be.  Just how I have to think about how to code, it should be expected that they need to think about how the CBO works.  It is one of those fundamental differences between DBA’s and Developer’s mindsets that make each group special.

*****This is where I’m going to rewrite this post.  Reason?  I rewrote the original query to disguise my company’s SQL as I’m always requested to do, but I disguised it so well that it didn’t have the same challenges as the original or address something that I really wanted to discuss.  I’m still going to change the column and table names, but I’m going to use the original query, as this is a topic I really want to discuss honestly, (forgive me boss for I have sinned…. J)  We worked with a number of different options with this query, with the sort on this wide table for a couple days.  It’s a new design that is part of a new project and it was time to really think outside the box, (which I will get into at the end of this post. *********

Our example for the evening is a query that is performing poorly.  The (REAL) query is a select on ONE table with parallel to gather a performance gain from temp tablespace groups and includes an order by:

select /*+ PARALLEL(t1, 4) */  t1.* from TBL1 t1 
order by t1.MAIN_ID;

(Now let’s admit to the real performance challenge here…) The query isn’t returning after a full day, you note that the long ops on the process states it has 4138 hrs left and climbing, but you notice, if you simply take out the order by, the query returns in a matter of minutes ALL OF THE DATA.  Why, when you put in the order by, it refuses to return anything?

As the sort is the pinpoint of the latency, a quick look at the most common work area for this large of a sort would be in order-  TEMP usage:

select vst.sql_text,  swa.sid, swa.tablespace

, swa.operation_type
, trunc(swa.work_area_size/1024/1024) “PGA MB”
, trunc(swa.max_mem_used/1024/1024)”Mem MB”
, trunc(swa.tempseg_size/1024/1024)”Temp MB”
from v$sql_workarea_active swa, v$session vs, v$sqltext vst
where swa.sid=vs.sid
and swa.sql_id=vs.sql_id
and vs.sql_id=vst.sql_id
and vst.piece=0;

 SQL_TEXT                                SID TABLESPACE   OPERATION PGA MB Mem MB Temp MB

select /*+ PARALLEL(t1, 4) */    234 TEMP_A              SORT (v2)      47           1024         96179
select /*+ PARALLEL(t1, 4) */    377 TEMP_B              SORT (v2)      58           1024         116256
select /*+ PARALLEL(t1, 4) */    260 TEMP_C              SORT (v2)      76           1024         146017
select /*+ PARALLEL(t1, 4) */    304 TEMP_D              SORT (v2)      47           1024         96269

Note the difference in the parallel slave use of temp.  This is a clear indicator that Oracle is doing what it can with what it knows about these objects, but something is amiss.

What are we working with here?
select segment_name, bytes/1024/1024 MB from dba_segments
where segment_name=('TBL1')
group by segment_name;
TBL1= 243GB
Rest of info:
TBL1=558 million rows, 238 columns, last analyzed, 2 hrs ago.
Now that we know what we are dealing with-  Why are the sorts out of whack on the tables?  The tables have been analyzed. 
Oracle can only sort data if it knows what it is dealing with.  What is required to sort data correctly?  Accurate information and as a sort is at a column level, it must be at the COLUMN level.  The question the DBA should ask themselves is not "Have statistics been gathered?" but if "Have statistics been gathered correctly for how the data is being utilized?"
For the DBA, this is where the DBA_TAB_COLUMNS view can come in very handy.  It can quickly show you if the info that is required for the where clause is present for the CBO.
select table_name, column_name, num_distinct, low_value, high_value, num_nulls,
last_analyzed, global_stats, user_stats, histogram from dba_tab_columns
where table_name in ('TBL1','TBL2)
and column_name in ('MAIN_ID');

Yeah, I think the database is trying to tell us something… :)

How can the database know how to best sort the data if it doesn’t know anything about the columns it is supposed to join or sort on??

Table statistics were collected, but no column information was collected.  The most likely instigator would be method_opt=’FOR ALL INDEXED COLUMNS’ and due to these tables not having any indexes, no column data was collected.  Recollect the statistics with method_opt=’FOR ALL COLUMNS SIZE 1′ .

exec dbms_stats.gather_table_stats( ownname=>'OWNER1',tabname=>'TBL1', ESTIMATE_PERCENT=>.00001, METHOD_OPT=> 'FOR ALL COLUMNS SIZE 1',CASCADE=TRUE,DEGREE=>8);
Now, the CBO will know a bit more about what it's working with!
select table_name, column_name, num_distinct, low_value, high_value, num_nulls,
last_analyzed, global_stats, user_stats, histogram from dba_tab_columns
where table_name in ('TBL1')
and column_name in ('MAIN_ID');
TBL1 MAIN_ID 586368000 C40D1A2050 C50A4736594A 0 10/25/2010 8:41:53 PM YES NO NONE
Returning to run the original query, now that the table has column stats: 
select /*+ PARALLEL(t1, 4) */ t1.* from TBL1 t1
order by t1.MAIN_ID;

The final performance and usage?  The example above was able to write the data to file in 5hrs and the TEMP usage was in line with the amount of data that was being sorted.  There is still a delay in pulling the data to the application layer, but that is a fight for another day.  The final answer for this query was simply the statistics, but the challenge for the people involved still revolves around the wide table and how to ease some of the strain on the database due to the sort:


307 TEMP_C SORT (v2)        1029           1024         1739

247 TEMP_A SORT (v2)        1024           1024        1755
286 TEMP_D SORT (v2)        1022           1024         1783
354 TEMP_B SORT (v2)        1024           1024         1755

****Coskan asked why I threw in the index and forced the query in the original version with a hint.  This was an attempt to “dimensionalize” the wide table.  Oracle will not utilize an index like a dimension table would be utilized in a star schema and perform a hash join unless you use a hint.  The test was how long would it take and how efficient would it be to sort on the indexed column vs. the table?

There were multiple tests, the first attempting to use the index, but due to the sort, it refused the parallelism which was required to get through the 558 million rows.  The second choice was to perform the following, which I will admit-  worked great for Oracle to write out to a file, did not work well for the application to utilize and perform the same! L

CREATE table MID_DATA tablespace USR_DATA1 as select MAIN_ID from TBL1;

This took only 42 minutes to create.  This was then used to join on the MAIN_ID and the order by.

SELECT TBL1.<each column but the MAIN_ID>, MD.MAIN_ID
order by MD.MAIN_ID;

I was able to query AND extract the data to an output file on a remote backup server to simulate what the application would do in 5 hrs.  I thought I had a winner here, but unfortunately, the application still sorted the entire table and temp usage went through the roof, (we run into that challenge often where a solution in Oracle does not solve the problem at the application SQL level due to incompatibilities in how the application presents the SQL, may rewrite the query, etc.)

There is a reason we have star schemas and when Oracle is unable to profit from one due to one reason or another, it can be an incredible battle for the DBA and developers who work in these environments.