Simple Reporting Without Materialized Views

A common initial resolution to reporting requirements without a reporting database is to utilize materialized views.  Unfortunately, the cost to refresh these views can be extremely expensive on a database, especially if it’s an OLTP that rarely has a lull in activity.

payday loans lenders online

A lesser known feature of Oracle’s is the Change Data Capture, (a.k.a CDC
http://download.oracle.com/docs/cd/B10500_01/server.920/a96520/cdc.htm )
The CDC process utilizes a materizalized view log on a source table with a destination copy of the source table that can be kept insync with the original in any interval chosen.  The result is often a much lighter footprint than an mview refresh, no matter if you’ve employed complete or fast refreshes. 

I’ve had added success by pulling sections of the CDC underlying code to syncronize destination objects and replicating just the pieces I need to get the results desired.  CDC has a lot of features and can be over-whelming the first time a DBA or developer delves into it.  For anyone who would like to reproduce the simple source-to-reporting object syncronization with an mview log, here are the high level steps:

1.  Create a materialized view log on the source table using a primary key or rowid.
2.  Create a stage table that will hold two columns of information:
            1st Column:  The type of row, Insert, Update or Delete, (‘I’,'U’,'D’).
            2nd Column: The primary key or rowid.
3.  Create two stored procs, either in a package or separately, where the 1st one will populate the stage table and the second will take the info from the stage table and process the rows to the destination table.

In depth, the process and code will appear similar to the following example:

Create the materizlied view log on the source table, (SRC_TBL):
CREATE MATERIALIZED VIEW LOG ON SCOTT.SRC_TBL;

Create the staging table:
CREATE TABLE STG_PK_IDS
(OPT CHAR(1),
PK_ID NUMBER not null)
tablespace USERS;

Create the stored procedure to load the stage table, (STG_TBL):
CREATE OR REPLACE PROCEDURE UPD_PKID_FOR_RPT IS
/*********************************************************************
–Get Records for both insert and update
CURSOR cur IS
SELECT *
FROM (SELECT ‘I’ opt, pk_id
FROM scott.mlog$_src_tbl where DMLTYPE$$=’I')  –Query the mview log on the source
UNION ALL
(SELECT ‘U’ opt, pk_id
FROM scott.mlog$_src_tbl where DMLTYPE$$=’U')
UNION ALL
(SELECT ‘D’ opt, pk_id
FROM scott.mlog$_src_tbl where DMLTYPE$$=’D');


BEGIN


– Insert records for transactions into staging table, all of them
FOR rec IN cur LOOP

IF rec.opt in (‘I’,'U’,'D’) THEN
INSERT INTO scott.STG_TBL (opt, pk_id) VALUES (rec.opt, rec.pk_id);
END IF;
END LOOP;
COMMIT;


–Purge processed rows from the mview log, using the reporting table to tell us which rows were added to it.
DELETE FROM scott.mlog$_src_tbl
WHERE OPT in (‘I’,'U’)
AND PK_ID in (select PK_ID from scott.RPT_TBL);


COMMIT;


END UPD_PKID_FOR_RPT;

Create a second stored prodedure to populate the data to the reporting table, (RPT_TBL)
CREATE OR REPLACE PROCEDURE UPD_RPTTBL_SP AS

/*********************************************************************

CURSOR cur IS
SELECT *
FROM (SELECT stg.opt,
stg.pk_id,
src.user_id,
src.phn_nbr,
src.login_id,
src.usr_type_cd,
src.usr_dt,
src.email_add
FROM scott.SRC_TBL src, scott.STG_TBL stg
WHERE src.pk_id = stg.pk_id
AND stg.opt=’I’

UNION ALL
SELECT stg.opt,
stg.pk_id,
src.user_id,
src.phn_nbr,
src.login_id,
src.usr_type_cd,
src.usr_dt,
src.email_add
FROM scott.SRC_TBL src, scott.STG_TBL stg
WHERE src.pk_id = stg.pk_id
AND stg.opt=’U’

UNION ALL
SELECT stg.opt,
stg.pk_id,
src.user_id,
src.phn_nbr,
src.login_id,
src.usr_type_cd,
src.usr_dt,
src.email_add
FROM scott.SRC_TBL src, scott.STG_TBL stg
WHERE src.pk_id = stg.pk_id
AND stg.opt=’D’
);

v_bgn_cnt number :=0;
v_step_name varchar2(200);
v_rec_ins number := 0;
v_start_dt date := sysdate;
v_error_msg varchar2(500);


BEGIN
v_step_name := ‘Process inserts and updates into scott.RPT_TBL’;


SELECT count(*)
into v_bgn_cnt
FROM scott.STG_TBL;


IF v_bgn_cnt >0 THEN


– Process each change


FOR rec IN cur LOOP


IF rec.opt = ‘I’ THEN
INSERT INTO scott.RPT_TBL
VALUES
( rec.pk_id,
rec.user_id,
rec.phn_nbr,
rec.login_id,
rec.usr_type_cd,
rec.usr_dt,
rec.email_add);
END IF;

IF rec.opt = ‘U’
THEN
UPDATE scott.RPT_TBL
set USER_ID = rec.user_id,
PHN_NBR = rec.phn_nbr,
login_id = rec.login_id,
user_type_cd = rec.user_type_cd,
usr_dt = rec.usr_dt,
email_add = rec.email_add
WHERE pk_ID = rec.pk_ID;
END IF;

IF rec.opt = ‘D’
THEN
DELETE from scott.RPT_TBL
WHERE pk_ID = rec.pk_ID;
END IF;


COMMIT;
END LOOP;
END IF;

–Insert Error handling here, I’m going to skip this in the example…
–Add a lookup table to track the pk_id from previous runs, etc.
–Truncate the stage table for the next run once success had been reported

END UPD_RPTTBL_SP;

We now can use the objects and procedures to maintain a copy of the SRC_TBL.  The DBA or developer simply needs to choose a functional interval to refresh and execute the procedures view a scheduled job or through a shell script.
exec UPD_PKID_FOR_RPT;
exec UPD_RPTTBL_SP;

Ta-da!  A simple process to syncronize data from a transactional table to a reporting copy.  This process can be scaled to load and commit in many different ways, depending on the demand and size of the targets.  It’s simple and can support many levels of loading intervals without the impact that often occurs with a materialized view refresh.

 

Transactional and Reporting

In my years as an Oracle and SQL Server DBA and the many technical environments I’ve worked in, there’s almost always one database in each place that is a hybrid of OLTP and OLAP.  A highly transactional database that also has reporting requirements, but no separate reporting database. 

This type of database is pretty much a guaranteed challenge for any DBA.   On one hand, the DBA has demands of the application demanding micro-second response time for completion of transactions.  On the other, he or she has users wanting information about the data resulting from all those transactions. 

A DBA’s best course for successfully taming these types of hybrid environments is working closely with developers and applications specialists. 

Steps I recommend taking to tune a hybrid environment:
1.  Identify top 10 processes that are impacting the database at a time-  five that are impacting users and five identified by the database group, introducing new statements as previous ones are tuned and removed from the list. 
2.  Group statements that are impacted by each other together and address as mini-projects.  Hybrid databases can become an over-whelming task- breaking the challenges down into smaller projects help.
3.  Lock down defined performance goals, (report “a” must respond to users desktops in 20 seconds,  transaction “2″ must complete in .02 seconds, etc.)

4.  Challenging the queries for reporting- 
 - What queries simply need an index.  Ensure the index is cost-effective, (i.e. the cost of supporting the inserts and updates on the index are worth the savings to the query…)
 - Which queries are aggregating/summarizing transactional data?  These reports may require a reporting table or a materialized view.
 -  If a materialized view is seen as a resolution for the reporting query, inspect the impact of a materialized view log on the transactional table.
 -  If materialized view refreshes are the issue and not the materialized view log, research to see if a CDC process can be utilized to create a reporting table, (CDC= Oracle’s Change Data Capture)
 -  Inspect the queries-  ensure the SQL is optimized as efficiently as possible. 
 -  Ask questions and request justifications if a query doesn’t make sense.  Often a developer or application specialist will be just as surprised to find out an application or code is impacting, (or sometimes that reporting queries are still being utilized!)

5.  For transactional processing challenges-
 -  Inspect not only concurrency and waits of transactions, but logic/path of processes.  Many times performance waits can be a complex combination of issues and not one query/statement.
 -  Inspect the number of executions vs. the elapsed time per execution.
 -  Are the indexes being used effectively?  Are there range or skip scans that could be unique scans?
 -  Full table scans that could be effectively removed with a cost-effective index, (I know, I demand that cost-effective part… :) )
 -  Ensure the application is querying the database efficiently-  look for oddities, such as single row results that have an order by or returning data to set a pre-cursor in an application that could be done without a database call.

Test thoroughly, but also expect a set back from time to time-  hybrid environments are a unique challenge for any DBA.

Finally, DON’T give up, be persistent and continue to identify the bottlenecks that are holding the hybrid database back.   These databases are worth the time and effort!

RMOUG 2011 Abstract Started

I’ve started working on my abstract to submit for RMOUG 2011.  I want to present on some of the CBO challenges I’ve taken on in the last year with large data sets.   There are a number of tricks that could be very useful to others in how to manipulate the CBO to work with tables over 100G in size.  Many of these tricks were requirements in areas where there have been small design flaws that “confused” the CBO and are very challenging for any DBA.

I would also like to go into the steps I take to ensure that I am completely and intuitively aware of how each of my databases live and breathe on a daily basis.  I believe it is essential to know your data to best manage your database.   This should be an interesting translation to paper, as I am an oddly, intuitive DBA, knowing often by one look at an OEM grid, TOP or even iostat result that something is amiss in one of my environments.

As it’s a busy time at my company, I’m glad to know the abstract will not take too much time to prepare.  I can’t say the same for the presentation, but work and life will slow down soon and I’ll be able to apply myself to the task fully.

The truth be known, I wasn’t very satisfied with my first RMOUG presentation.   The presentation definitely was much better when I gave it in front of the mirror!  On the other hand, I was very satisfied with the reviews-  folks were much, more positive of my presentation than I was and gave excellent feedback that should help me be a stronger presenter this next year.  Most feedback involved my problem with pacing and a stationary microphone.  Many were unable to hear me at times in the back of the room as I moved away from the microphone repeatedly.  I will definitely be wearing a clip-on microphone this upcoming year, (and maybe try to control my nervous pacing I’m so famous for during presentations… :) )

I also should find ways to ‘de-sensitize’ myself from the chaotic distractions of the RMOUG conference.  I’ve been very successful at creating a calm work and home environment that allows me to easily cope with my ADHD.  I forgot how impacting all the people, noise, displays and this new environment is for me.  I felt I wasn’t able to think or speak clearly either day of the conference, nor was I able to find a solid, controlled distraction to help me cope.  There always seemed to be some, odd distraction that most others wouldn’t even notice, but for me, it might as well been someone raking their fingernails down a chalkboard-  ADHD is like that and when a distraction exists.  I have to escape from it or find a more dominant, controlled distraction to downplay it with.  Without one of these options,  I’m fighting to stay on track every second, unable to focus on my presentation.

I really hope the lessons I learned from this last year allow me to give a more solid presentation this next year.  Onto my abstract-  wish me luck!