~ The least questioned assumptions are often the most questionable ~ Paul Broca
I’ve always found assumptions to be one of the most common cause of failures in software releases/maintenance windows.
If I had a dime for every time I’ve heard, “I thought you were going to take care of/do that…” or “I didn’t involve you earlier because I thought the steps you were responsible for were so easy…”, well, you know the saying and I’d be rich.
Assumptions causing participation of the DBA too late into a project, release or maintenance is widespread enough when you are onsite, but as a remote DBA, can take on a whole new dimension. Where being in the office area might allow you some over-heard conversation or privy to meetings that you realize you should be part of, working off-site can really set you up to miss out on important prep time to offer the best level of support.
It’s nothing new, not rare and its most obvious in hindsight, after the release or maintenance has completed. As paranoid as I think I am, causing me to involve myself pretty well, (I’m not a fan of surprises… :)) I did experience it this last weekend as a new DBA for a client. Acclimating to a client, as well as they becoming comfortable and involving you as their new DBA takes time. Something we just didn’t have the opportunity to do much of, nor was it anyone’s fault. Enkitec was unaware of this year-end maintenance, so they assumed I would easily take-over ownership of the client from the previous DBA.
Friday there was some concern, after they sent an email with the tasks they needed my time allocated for that night and my “real” first time on the client’s systems, that there might be a disk space issue for the required backup post the special, once-a-year release upon completion.
I did some quick research after this was discovered and offered an option but the client’s Unix admin cleared off some disk space and assumed the space would be enough. Now the estimated space usage for the release was not that big, definitely not that large when you consider what I’m used to. we are talking gigabytes, not terabytes. Only being in the system for one day, I made my first mistake and assumed the problem was taken care of and proceeded to perform the duties I had been assigned me for that evening and let them work through the weekend.
The client had assumed the tasks were quite simple for a DBA- the previous DBA had been there the entire time they had been clients and the database growth had been of minimal concern. It was taken into consideration that I may require a little more time to find my way around the environment, become familiar with the specifics of design, jobs and backup scenarios, etc., but I had no issues with the pre-release work, so why would “reversing” the process for the post work result in any difficulties?
Post the weekend work, they contacted me and asked me to start the tasks for after year-end processing. Everything worked out well until close to midnight when the backup failed. We didn’t have enough space.
The backup strategy is not RMAN backup files, but image copies, level 1 incremental and the size of the database due to the release ALONG with additional UNDO, etc. caused the database to be too large to fit on the volume. Different strategies hadn’t helped, even splitting across multiple channels to multiple volumes was not enough, now I was having a formatting issue on the apply to the incremental. It did not like the change one bit and yes, it was after midnight, (have we discussed when a DBA’s horse-drawn carriage turns back into a pumpkin yet? :))
The unique configuration and my newness to the environment meant that it did take me a bit longer to work and investigate issues, (although I do think this may be the best way to retain knowledge about an environment, it’s just more painful for everyone involved!) I finally had the answers I needed in the wee morning hours-
– how to retain the existing backup from before the release from the same volume as I needed more space on.
– change from an image copy incremental to a level 0, rman compressed backup.
– what parallelism was acceptable during business hours.
– what jobs had to be stopped to not impact production while all this occurred.
Now the second level of the problem- I’ve been up two nights of the last three, had been ill on Saturday and I was starting to feel my IQ slip away like the free space on the backup volume. Enkitec beleives in giving their clients valuable resources that are able to give them their best and I was in no way close to that. I really appreciate it that my secondary DBA to this client, Miranda was so helpful, so willing to take what I, as the brand new DBA, had and put the plan to action, (and make it sound so easy to me who had so little IQ left at that point! :)) I promptly notified the client after I transitioned to her the information and then collapsed to a deep slumber.
Now we come to the moral of our story.
This should have been simple work for the DBA. It was assumed to be so by BOTH parties: the DBA and the client. This was our downfall in that we really should make only one assumption when it comes to maintenance and releases- If we assume, especially on the simple stuff, it will most likely be what causes our biggest challenges. The DBA should be involved from the beginning of any project, maintenance or release and then from there, once both sides have inspected the tasks/level of difficulty, can they both decide that the DBA is not required to be heavily involved. An open invitation should be available to the DBA to return if any “red flags” arise and all communication should still include the DBA to ensure that there are as few surprises as possible.