Postgres

SQL Identity and Plan Fingerprints

I know, I’m back down this rabbit hole again…

As I’ve stated in other blog posts, I’m fascinated by similarities and differences in database platforms and when it comes to performance data, no two platforms are more similar, while being completely different than Oracle and PostgreSQL.  No matter how you view those differences- technical or philosophical, it’s worth researching regularly to see how close and how far from each other they continue as the platforms evolve.

The truth is, when you’re troubleshooting performance, building baselines, or explaining why the same query behaves differently today, you eventually end up asking a deceptively simple question:

“How does the database identify this query and its execution plan?”

Both Oracle and PostgreSQL answer this question, but I find they do it in very different ways, reflecting fundamentally different design philosophies around optimization, observability, and stability.  As I dive into this rabbit hole once again, I’m going to reflect on how Oracle’s SQL_ID differs from the query_id in PostgreSQL and how two terms that sound so similar (PLAN_HASH_VALUE and query_hash) could be generated so differently, as well as misinterpreted.  I’m guilty of it myself, so it’s a good place to spend some time.

The Core Concept: Query Text vs. Execution Plan

At a high level, every relational database needs to track two things:

  1. The query itself, i.e. what is the logical intent.
  2. How it was executed vs. the physical plan that was used.

Oracle treats these as separate first-class identities.  I was trained to call an explain plan proposed by the optimizer as an estimate and the actual plan used the execution plan.  I do view them as two, different entities in the database, where one is a theory and the other is reality.
PostgreSQL treats the query as primary, with the plan as largely ephemeral, which is difficult to put together with my Oracle view of the world.

I think the most interesting thing is the difference drives everything forward in this post.

Oracle: SQL_ID and PLAN_HASH_VALUE

The SQL_ID is the Logical SQL Identity

A SQL_ID is Oracle’s fingerprint for a SQL statement’s normalized text.  The SQL_ID remains a constant for a query across databases, database versions and even cloud or engineered systems.  This identifier can offer tracking across the Oracle ecosystem.

Key properties to understand about a SQL_ID:

  • It’s generated from a canonicalized version of the SQL text
  • Literal values are normalized
  • Whitespace and formatting differences are ignored
  • Same logical SQL results in the same SQL_ID

Take the following query below:

SELECT * FROM orders WHERE customer_id = :1;

No matter what value is passed, the SQL_ID remains the same in each database it’s executed in.

You’ll see and can track this same SQL_ID in Oracle’s management views, performance tools and reporting, including:

  • V$SQL
  • V$SQLAREA
  • V$ACTIVE_SESSION_HISTORY
  • DBA_HIST*
  • AWR reports
  • Oracle Enterprise Manager

Oracle assumes SQL identity matters long-term and will retain it for that purpose.

The Execution Plan Fingerprint

To identify the plan that’s used with a specific SQL_ID, we require the PLAN_HASH_VALUE.  The PLAN_HASH_VALUE represents the shape of the execution plan.

The PLAN_HASH_VALUE is calculated from:

  • The operations in the plan
  • The join order
  • Access paths, including any scans, hashes, sorts, etc.
  • Plan structure, but not cost or cardinality

If the plan shape changes, the hash changes, even if the SQL text does not.  This means that you can have a single SQL_ID with many PLAN_HASH_VALUEs attached to it.

SELECT SQL_ID, PLAN_HASH_VALUE FROM V$SQL
WHERE SQL_ID=’8g1k2m3n4p’;

SQL_ID         PLAN_HASH_VALUE
------------- ----------------
8g1k2m3n4p     1234567890
8g1k2m3n4p     9876543210

This isn’t an accident, but an intentional way to track the plan and performance changes in an Oracle database over time.  Yes, Oracle expects multiple plans per SQL and not just that plans will evolve over time, but that active plan management will occur with features such as baselines, SQL Plan management, profiles, etc.

How Oracle Presents This Data

Oracle makes this relationship very explicit and with it’s extensive features around performance data, offers a wide range of ways the SQL_ID and PLAN_HASH_VALUE will be used with these features to provide performance analysis.

ConceptOracle Artifact
Query identitySQL_ID
Plan identityPLAN_HASH_VALUE
HistoryAWR
Runtime visibilityASH
Plan controlSQL Plan Management

Oracle assumes DBAs will want to know about plan data and any changes that happen over time.

PostgreSQL: query_id and query_hash

PostgreSQL approaches this from almost the opposite direction.  Some of this comes from the very history of the product in the hands of developers and the other is the notion that with PostgreSQL, if you need something, you simply build it. 

A Logical Query Fingerprint

The query_id does appear in the following:

  • pg_stat_statements
  • auto_explain
  • log_min_duration_statement

It is a hash of the normalized parse tree, not the raw SQL text, which is different from how Oracle has approached the SQL_ID.

Key properties of the query_id are:

  • Literals are normalized
  • Whitespace is irrelevant
  • Same logical query results in the same query_id
  • It requires compute_query_id = on

So for the following statement, one very similar to our earlier one in Oracle:

SELECT * FROM orders WHERE customer_id = $1;

We end up with a query_id, but with differences showing up at the plan level.

Not All Query Hash Data is Alike

PostgreSQL, unlike Oracle, does not have a stable, persistent equivalent to PLAN_HASH_VALUE. 

Instead, there’s a plan generated at execution time.  Unless you decide to use prepared statements, the plan may not persist, where in Oracle, it’s far less common to not have statements found in the repository, (although it does happen for some short running statements.)  The plans in PostgreSQL are also not treated as a durable identity.

As PostgreSQL becomes more widely adopted, and managed by DBAs who came from other database platforms, some extensions and tooling expose more information, such as:

  • A derived plan hash
  • This plan is often calculated from EXPLAIN output
  • It’s typically used for comparison, not governance

This is a crucial distinction and something I have to remind myself of often:

PostgreSQL does not assume execution plans are long-lived artifacts.

How PostgreSQL Presents This Data

PostgreSQL focuses more on aggregate behavior, vs. on plan lineage:

ConceptPostgreSQL Artifact
Query identityquery_id
Plan identityEphemeral / derived
Historypg_stat_statements
Runtime visibilityauto_explain, logs
Plan controlLimited, as hints are non-core

The philosophy is to fix the query, fix the schema, and let the optimizer do its job.

Side-by-Side Comparison

As reminders are always good, I thought I would build out a side-by-side comparison, as it goes a long way to drive the important points home.

FeatureOraclePostgreSQL
Query identifierSQL_IDquery_id
NormalizationYesYes
Plan identifierPLAN_HASH_VALUENot native
Multiple plans per queryExpectedRarely tracked
Plan persistenceStrongWeak
Built-in plan governanceYes, with SPMNo
Typical tuning approachPlan-focusedQuery/schema-focused

Why This Matters in Real Life

Common Oracle to PostgreSQL Migration Pitfall

I still have to catch myself when I’m in a PostgreSQL database and I start down the rabbit hole of:

“Where is the plan hash history?”

The honest answer is that PostgreSQL doesn’t think that way and nor should it out of the box.  How many of us install Oracle with all the bells and whistles and wonder how much we’re going to have to maintain just to run a simple data repository in an age that may not always need a well of database features?

If you need Oracle-style plan tracking in PostgreSQL, you’re going to need to build it, but it’s not difficult.  It’s just not automated out of the box and with the big price tag that Oracle offers.

  • Capture EXPLAIN (ANALYZE, BUFFERS)
  • Hash plan output
  • Store snapshots yourself
  • Correlate to the query_id when needed

As I’ve said before, this is not an oversight, but a design choice.

Simplicity is Power with PostgreSQL’s Model

PostgreSQL may seem to be in its infancy when compared against Oracle’s 40 years of investment, but the truth is, it excels when:

  • Query patterns are simpler
  • Schema design is strong
  • Plan instability is rare
  • Developers own performance fixes

It avoids the operational overhead of plan governance and with it the cost of maintaining that overhead.

Oracle’s Model is Robust

Oracle is unmatched when viewing it through a microscope, especially from the POV of someone with deep Oracle experience.  That someone is going to expect:

  • Fine-grain details when workloads are highly dynamic
  • Expansive options for SQL when it can be easily rewritten
  • Plan regressions are unacceptable
  • Performance stability is a business requirement

In Summary

Oracle and PostgreSQL solve the same problem from opposite ends.  While Oracle treats SQL and plans as long-lived and as managed assets, PostgreSQL, at least currently, treats plans as implementation details.   Neither approach is “better”, per say, but assuming they are equivalent leads to bad tuning decisions, broken monitoring, and painful migrations.

Understanding SQL_ID vs query_id, and why PLAN_HASH_VALUE has no true peer in PostgreSQL is one of the most important mental shifts engineers must make when working across both platforms.

http://about.me/dbakevlar