Archaic Data Transfer- A DBA Rant
There was a great post by Noel Yuhanna on how he deems the number of DBAs required in a database environment by size and number of databases. This challenge has created a situation where data platforms are searching for ways to remove this roadblock and eliminate the skills needed to manage the database tier.
I’ve been a DBA for almost 2 decades now, (“officially” as my beginning date with an official title and my years experience differ by a couple years…) When I started, we commonly used import and export features to move data from one data source to another. Tim Hall just wrote up a quick review of enhancements for Oracle 12.2 Datapump, but I felt dread as I read it, realizing that it continues to hold DBAs back with the challenge of data gravity.
Data move utilities may go through updates over the years and they do have their purpose, but I don’t feel the common challenge they’re used to undertake is the correct one. Taking my two primary database platforms as an example, for Oracle, we went from Import/Export to SQL Loader to Datapump and for SQL Server, we went from BCP, (Bulk Copy Protocol) to, Bulk Inserts to a preference on the SQL Server Import/Export Wizard.
Enhancements?
Each of these products have introduced GUI interfaces, (as part of a support product or third party product), pipe functions, parallel processing and other enhancements. They’ve added advanced error handling and automatic restart. Oracle introduced the powerful transportable tablespaces and SQL Server went after filegroup moves, (very similar concepts, grouping objects by logical and physical naming to ease management.)
Now, with the few enhancements that have been added to data movement utilities, I want you to consider this- If we focus on data provided by Forbes,
- there is 1.7MB of data, per person, per second generated in the world today.
- That data has to be stored somewhere.
No matter if it’s relational or big data or another type of data store, SOME of that data is going to be in the two RDBMS that I used in my example. The natural life of a database is GROWTH. The enhancements to these archaic data movement utilities haven’t and never will keep up with the demands of the data growth. Why are we still advocating the use of them to migrate from one database to another? Why are we promoting them for cloud migrations?
This Is(n’t) How We Do It
I’m seeing this recommendation all too often in documentation for products and best practice. Oracle’s Migration steps for 11g to the cloud demonstrates this-
- DataPump with conventional export/import
- DataPump transportable tablespace
- RMAN Transportable tablespace
- RMAN
CONVERT
transportable tablespace with DataPump
These tools have been around for quite some time and yes, they’ve been are trusted sidekicks that we know will save the day, but we have a new challenge when going to the cloud- along with data gravity, we have a network latency and network connect issues.
Depending on the size, (the weight) of our data that has to be transferred, Database as a Service can turn into a service for no one. Failures, requiring us to perform a 10046 trace to try to diagnose a failed DataPump, with the weight of data gravity behind it, can delay projects and cause project creep in a way that many in IT aren’t willing to wait for and the role of the DBA comes to a critical threshold again.
I’m not asking DBAs to go bleeding edge, but I am asking you to embrace tools that others areas of IT have already recognized as the game changer. Virtualize, containerize and for the database, that means Data Pods. Migrate faster- all data sources, applications, etc. as one and then deliver a virtual environment while offering you the time to “rehydrate” to physical without other resources waiting for you- the DBA, the often viewed roadblock. Be part of the answer, not part of the problem that archaic import/export tools introduce because they aren’t the right tool for the job.
I had never heard the term data gravity before, I like it.
https://readwrite.com/2012/05/07/what-data-gravity-means-to-your-data/
Yay! Glad to hear it, Chet… Dominic Delmolino and I will be doing a podcast on this topic very soon- stay tuned! 🙂