Right Deep, Left Deep and Bushy Joins in SQL

What is right deep verses left deep? Good question. In join trees (not VST) the object on the left is acted upon first then the object on the right.  Below are left deep and right deep examples of the same query, showing

  • query text
  • join tree
  • join tree  modified to more clearly show actions
  • VST showing the same actions

left_right_deep copy

All  of this boils down to the point that a right deep HJ can return rows earlier than a left deep HJ. A left deep HJ has to wait for each join to finished completely so that the result set can be hashed before  the next step can probe it. On the other hand, in a right deep HJ, it’s not the result sets that are being hashed, but a table at each level, thus each table can be hashed without waiting for intermediary results and once these hashes are complete  a probed row can flow all the way through the join tree,  from bottom to top, similar to how a nested loop can start giving results early.  The Left Deep HJs only have two open work areas at a time where as the Right Deep can have multiple work areas open.  One of the key thing to keep in mind is how much data we have to hash. If the intermediate result sets are large (and/or growing each level) then that represents more work each step of the way.

Normally I don’t think in left deep verses right deep because it doesn’t change the the path through the join tree. On the other hand it does change whether we are hashing a result set or we are hashing a table. If hashing a table then the results can be returned, in theory, more quickly.

For NL joins there are only left deep join. The object on the left always probes the object on the right.  The object on the left is always accessed first ( there is no need to modify the object on the left  first and probe from the right with NL).

 download (1)

Besides left deep and right deep, there are  also bushy joins which Oracle doesn’t do unless forces to through sub-queries or views as in the following:

download (2)

Finally,  the above VST diagrams were modified to be more easily compared to the join tree diagrams. Below are the VST diagrams as displayed by default in Embarcadero’s DB Optimizer.   DB Optimizer shows the tables all in a line because the tables are one to one relationships.  Keeping the VST diagram layout constant, it is easier to see the differences in execution paths:

vst_bushy_copy

Note on hinting and order of tables in explain plan:

For NESTED LOOPS and HASH JOINS the hint takes one argument, which is the second table that shows up in the explain plan.

For NESTED LOOPS the second table is the table probed into i.e. the second in order of access.

For HASH Joins the second table is the table doing the probing into the hash result set. Hashing is the first operation which creates the hash result set that the second table probes into.

For NESTED LOOPS  if order is “LEADING(X,Y) then  nested loops hint can only be on Yie USE_NL(Y)

For HASH JOINS if the order is “LEADING(X,Y) then the hash join hint can only be on Y, ie USE_HASH(Y)

Screen Shot 2014-05-06 at 6.51.02 PM

A couple of good references on USE_NL and USE_HASH

For more information see:

Print Friendly
May 5th, 2014 by

facebook comments:

  • Vishal Desai

    Great Explanation. Thanks Kyle.

  • khailey

    PS for all you might want to know (or close to all) about hashing order and swapping inputs see Jonathan Lewis’ brilliant little quiz night problem and solution at http://jonathanlewis.wordpress.com/2010/12/10/quiz-night-10/

  • Dave Brinkman

    For the HASH JOIN, if it is:

    HJ
    X
    Y

    then X is the inner table, i.e. the one hashed into memory. So, I’m thinking it should be LEADING(X,Y) USE_HASH(X) not LEADING(X,Y) USE_HASH(Y). Can you confirm?

    Thanks,

    Dave

  • khailey

    Change the test a bit. As I understand it the use_nl and use_hash always take the second table for an argument where the second table is the second table to show up in the explain plan. The problem for me is people often say the hint takes the inner table, but for me the inner a hash_join is the first table operated on, so I don’t think the term “inner” is appropriate for the second table in the hash join. The second table in a hash_join is the one doing the probing which for me is the outer table.

  • Dave Brinkman

    Hi Kyle,

    Thank you so much for the quick response. I apologize for being so obtuse. But I’d really like to understand the USE_HASH hint along with the LEADING hint against the appearance of the execution plan in the EXPLAIN PLAN output. I have struggled for some time on this subject and the documentation is so weak in this area. I believe you are the one that can finally clear my confusion once and for all.

    This is my understanding:

    HASH JOIN
    X (table hashed into memory)
    Y (table doing the probing)

    LEADING(first-table-to-be-accessed, second-table-to-be-accessed, etc.)
    USE_HASH(table hashed into memory)

    Based upon this understanding, I have LEADING(X,Y) USE_HASH(X). X is the first table in the LEADING hint since it is accessed first when it is hashed into memory.

    Please tell me where I am wrong.

    Thank you for your knowledge and your patience.

    Dave

  • khailey

    From testing, it looks like the second table the explain plan should always be the one in the use_nl or use_hash hint. Let me know if you have an cases that show otherwise. The confusion comes from the fact that documentation around the web says to put both table1 and table2 in the hint parenthesis which is incorrect or that the hint takes the inner table which is correct for use_nl but confusing at best or wrong for use_hash where the inner table is actually the first table in the explain plan.
    It is easier to remember that that both hints take the second table as they appear in the explain plan order

  • khailey

    actually as Jonthan Lewis points out the use_hash hint alone is not sufficient, thus my explanation is lacking. See
    http://jonathanlewis.wordpress.com/2013/09/07/hash-joins/
    The correct hint should be
    leading(table_1 table_2) use_hash(table_2) no_swap_join_inputs(table_2)
    to describe what I was trying to so above.
    But one can make table_2 the probing table by doing
    /*+ leading(table_1 table_2) use_hash(table_2) swap_join_inputs(table_2) */

  • Facebook
  • Google+
  • LinkedIn
  • Twitter