Database

Wading Through Database Platforms

As a heterogenous company, I get a lot of questions on what database platforms are important, why and if we should invest in them.  When looking at some of the lists and some of the marketing, it can be a bit overwhelming.

Now I’m not going to promise to take all of the confusion out of the myriad of database platforms out there, but I’m hoping to help with what’s relevant.

Ranking

So let’s take a look at what is in the list of top databases engines in the industry per db-engines.com.  The reason I use this site as my main “go-to” for information on database popularity has a lot to do with it’s method for collecting its information.  There are other ways of calculating popularity, but I’ve found that if you do by dollar amount, number of installations or instances, there is a bias that impacts and I find db-engines is just more reliable and telling of trends.

Categories of Database Platforms

The first thing you’ll notice is that [currently], RDBMS, (relational databases) rule the world.  Features, scalability and maturity have given relational a strong base and it continues today.  There are a number of databases that fall into the “big data” arena, including Cassandra, Hive, Solr and others, that bridge different platforms.  I still agree with the way that db-engines has classified them and it should be noted that they’ve looked at the architecture and not the use of the database.

RDBMS

Oracle had a large lead in the field, but as you can see, it’s decreasing on a consistent basis.  Most of us in the Oracle realm are hoping that the cloud shift will start to show the success that they are betting on.  Just as they did the search from (I)nternet to (C)loud, we have faith in this giant.   In contrast, MySQL is consistently strengthening their #2 spot.  They still are viewed as “open source” by many and with the power behind their advanced features, including clusters with version 5.7, folks are finding them to be a strong alternative to Oracle while still staying with the power of the Oracle company.

SQL Server is #3 and with their Azure cloud offering, they are currently the second strongest cloud behind Amazon.  I’ve worked in SQL Server as long as Oracle, having started with MSSQL 6.0-7.0 back in 2000.  I was very passionate about database technology and didn’t feel a strong kinship to any one platform, so was happy to work in Oracle, SQL Server, Sybase or others that came along in my early years of database administration.  Microsoft has invested heavily in business intelligence and data visualization with 2016 and I’m enthralled with some of the newest features.

Document Stores

MongoDB is the only document store that’s in the top 20.  It’s a love/hate relationship for those that use it and for those that came from the RDBMS world, they had a negative view of the platform.  There are a number of web developers that are loyal to the database platform and have propagated it out to the technical world.

A document store is quite the “anti-RDBMS”, considering it violates so many of the rules of an RDBMS, with different columns per row and no schemas.  Amazon attempts to have their own version of any database and for MongoDB, it’s DynamoDB.  Other competitors include:

Wide Column Stores

Cassandra and Hbase, both via Apache, are the top choices.  These have similarities to DynamoDB and other NoSQL databases.  These databases are highly scalable with their ability to be spanned on bare metal hardware and require significant technical knowledge of hardware, software and development to architect an environment that functions seamlessly and efficiently.  Having a strong architect is essential and having a strong, collaborative team is important to this type of technology.

Key-Column Stores

Rounding up the top ten in the list is Redis, which is a key-column store database engine.  Redis is an in-memory data store that supports a wide range of development languages and has a strong open source community.  One of the interesting features in this database engine is its ability to house geospatial indexes and use radius queries with them.  I worked with spatial data for four years when I first became a DBA and was always fascinated with this type of data.  Redis has something called GEORADIUS that makes working with this data much easier.

Search Engines

When I was at Pass Summit, Elasticsearch was  something that was mentioned often.  I also noticed that it was one of the platforms mentioned in Brent Ozar’s salary data, too.  Its pushed for AWS, but I’m seeing it more commonly on Azure and it has numerous followers due to its ability to do deep log analysis then build custom dashboards with Kibana.

I was also surprised, as I think a few will be, that Splunk is listed.  It is a search engine that offers insight into log data.  As logs can accumulate considerable storage in a short time, they’ve branched wisely into a cloud offering.   Just as tracing and logs are the key to database administrators, the value in log data can benefit anyone in Information Technology.

Not Listed

Now there are a few others that fall outside of what’s in the top categories and database platforms that I’d like to mention.

Snowflake, (169) and Redshift, (69) are starting to gain momentum.  Why?  With migration to the cloud, less is more.  The ability to virtualize your data warehouses are a great way to make it an easier shift, but also to lessen your cloud footprint, which means less storage usage.

This is a key to what Delphix is educating customers on as well, as we also virtualize all of non-production databases, many of the platforms seen in the db-engine list, making it easier to migrate to the cloud and lessen the storage footprint.  This is an area that technology should be emphasizing more.  I believe, as with most monthly subscriptions, the smaller bill looks great at first, but when you add it up over the long haul, you realize how much you’re paying for the cloud and it’s essential to recognize the opportunities to save on storage, compute and resources.

Control

The second area I want to focus on is control.  Control over our data, control over what we can do with our data and control over support when something goes wrong with our data.  We choose database platforms to have some control over what features and product support we have.  The dynamic has shifted as many companies now look to open source to control over what they produce by creating the features and product more on their own.  They feel limited by large technical companies and want more control over how much they can do and direction.

No matter what you decide, no matter what your favorite database platform is or what you use in your company, databases make the world go round.  The world continues to spin and databases continue to be the mechanism to deliver the data that we use everyday to answer questions, build trends and analysis from and make the decisions that the world relies on.  We Database Administrators and Developers are the ones that are lucky enough to be those to work in this exciting arena in technology.

Kellyn

http://about.me/dbakevlar