How many performance specialists have experienced projects where reporting was horrid in the beginning, maybe didn’t even complete? The focus of the project may have sat with data loads or due to the data architect’s design specialty, etc., the reporting aspect was just the last thing on everyone’s mind until it was too late. I come into this conversation with companies all the time and it is a difficult one, as there are so many database myths.
“Can’t you just do something to make the database go faster? I don’t know, isn’t there a parameter that can be updated, maybe some partitioning or compression that will fix our problem?”
I feel there pain, I do understand…
When a project to design a warehouse or mart is undertaken, the groups involved may be quite distracted by following “best practice”, i.e. “my fact tables must be just so and the dim table design must follow the same format as I used at company XYZ that was so successful….” You may find that the folks doing the actual design may not know as much about marts and warehouses and are learning as they go along. You may have someone who is unable to move away from an OLTP design in the mart, “everything must be relational and I must follow normal form, so let me add primary keys, foreign keys, constraints everywhere!” (Hey, it happens, I’ve seen it…:))
Never the less, as a development group learns, there is no place I’d rather be. No project is a lost cause and working with the group to create proof of concepts, showing them to correct way to design a warehouse or mart, along with star schemas, the power of bitmap indexes, etc. is a valuable time for any database administrator to spend. Ensuring you know what is causing the pain for an ETL load process, i.e. where time is consumed is essential, so that you are able to provide as many “wins” when making enhancements is pertinent, so you can get onto the reporting aspect of the project.
So why do we need to keep reporting on the horizon at all times? Never lose sight of this important feature of the warehouse or mart as this will most likely be the user interface to the environment. This is how the business will judge if its successful or not. If they can’t view the data in an acceptable amount of time, it doesn’t matter how quickly it loads at night or how beautiful your star schema is, as far as they are concerned, it was all a waste of money and time.
With that said, where do I see the most common challenges for reporting?
Storing data in a very different format than the way reporting requires it to be presented in.
I’m not talking about how a star schema is designed here. I’m talking about the data having to be constantly cased, then summed then sorted, hashed, sorted again, so on and so forth, to present the data the way the customer requires vs. having materialized views, reporting and rollup tables that can lessen the performance hit on the database.
Why is it a performance hit? I have another draft in my blog posts that I need to get out, but it’s a perfect example. The business needs to see the data presented consistently in a way that is no way near the way its stored. As the reporting queries perform well with simple hashes, bypassing nested loops, etc., there is no way that the amount of case statements, to sum, to sort and sort again, to fit within the allocation of PGA per process for it, even in parallel. This results in huge uses of temp tablespace. That means the process is now doing it’s work on disk and disk is just plain slow.
Why am I a temp tablespace nazi? This above is why.
I can move to temp tablespace groups and I can put my temp on SSD, but really, the best optimization enhancement will come from an materialized view or rollup table that will limit the amount of work that has to be done in PGA. Do not say, “Well, can’t you just raise the PGA?” These are on servers that the amount of PGA required is larger than the memory on the box and the temp tablespace is often the largest tablespace in the database, (remember, I work in VLDB’s and for someone to brag that they used 2TB of temp is not uncommon… :))
No hardware will offer the same kind of performance gain as a small design change along with a few tweaks to some reporting SQL to do less in a report to produce a report. No hardware will offer as long-term performance enhancements as the latter- plain and simple.
I wish I did have a silver bullet to make the database “go faster”, but that’s not how it works if you really want the problem fixed vs. having someone offer you some duct tape. Duct tape will last you a month, maybe a year if you are lucky, but the problem will continue to return until we fix it right.
OK, enough ranting for my first day as a ACE Director… 🙂