Subscribe to Blog via Email
As a heterogenous company, I get a lot of questions on what database platforms are important, why and if we should invest in them. When looking at some of the lists and some of the marketing, it can be a bit overwhelming.
Now I’m not going to promise to take all of the confusion out of the myriad of database platforms out there, but I’m hoping to help with what’s relevant.
So let’s take a look at what is in the list of top databases engines in the industry per db-engines.com. The reason I use this site as my main “go-to” for information on database popularity has a lot to do with it’s method for collecting its information. There are other ways of calculating popularity, but I’ve found that if you do by dollar amount, number of installations or instances, there is a bias that impacts and I find db-engines is just more reliable and telling of trends.
The first thing you’ll notice is that [currently], RDBMS, (relational databases) rule the world. Features, scalability and maturity have given relational a strong base and it continues today. There are a number of databases that fall into the “big data” arena, including Cassandra, Hive, Solr and others, that bridge different platforms. I still agree with the way that db-engines has classified them and it should be noted that they’ve looked at the architecture and not the use of the database.
Oracle had a large lead in the field, but as you can see, it’s decreasing on a consistent basis. Most of us in the Oracle realm are hoping that the cloud shift will start to show the success that they are betting on. Just as they did the search from (I)nternet to (C)loud, we have faith in this giant. In contrast, MySQL is consistently strengthening their #2 spot. They still are viewed as “open source” by many and with the power behind their advanced features, including clusters with version 5.7, folks are finding them to be a strong alternative to Oracle while still staying with the power of the Oracle company.
SQL Server is #3 and with their Azure cloud offering, they are currently the second strongest cloud behind Amazon. I’ve worked in SQL Server as long as Oracle, having started with MSSQL 6.0-7.0 back in 2000. I was very passionate about database technology and didn’t feel a strong kinship to any one platform, so was happy to work in Oracle, SQL Server, Sybase or others that came along in my early years of database administration. Microsoft has invested heavily in business intelligence and data visualization with 2016 and I’m enthralled with some of the newest features.
MongoDB is the only document store that’s in the top 20. It’s a love/hate relationship for those that use it and for those that came from the RDBMS world, they had a negative view of the platform. There are a number of web developers that are loyal to the database platform and have propagated it out to the technical world.
A document store is quite the “anti-RDBMS”, considering it violates so many of the rules of an RDBMS, with different columns per row and no schemas. Amazon attempts to have their own version of any database and for MongoDB, it’s DynamoDB. Other competitors include:
Cassandra and Hbase, both via Apache, are the top choices. These have similarities to DynamoDB and other NoSQL databases. These databases are highly scalable with their ability to be spanned on bare metal hardware and require significant technical knowledge of hardware, software and development to architect an environment that functions seamlessly and efficiently. Having a strong architect is essential and having a strong, collaborative team is important to this type of technology.
Rounding up the top ten in the list is Redis, which is a key-column store database engine. Redis is an in-memory data store that supports a wide range of development languages and has a strong open source community. One of the interesting features in this database engine is its ability to house geospatial indexes and use radius queries with them. I worked with spatial data for four years when I first became a DBA and was always fascinated with this type of data. Redis has something called GEORADIUS that makes working with this data much easier.
When I was at Pass Summit, Elasticsearch was something that was mentioned often. I also noticed that it was one of the platforms mentioned in Brent Ozar’s salary data, too. Its pushed for AWS, but I’m seeing it more commonly on Azure and it has numerous followers due to its ability to do deep log analysis then build custom dashboards with Kibana.
I was also surprised, as I think a few will be, that Splunk is listed. It is a search engine that offers insight into log data. As logs can accumulate considerable storage in a short time, they’ve branched wisely into a cloud offering. Just as tracing and logs are the key to database administrators, the value in log data can benefit anyone in Information Technology.
Now there are a few others that fall outside of what’s in the top categories and database platforms that I’d like to mention.
Snowflake, (169) and Redshift, (69) are starting to gain momentum. Why? With migration to the cloud, less is more. The ability to virtualize your data warehouses are a great way to make it an easier shift, but also to lessen your cloud footprint, which means less storage usage.
This is a key to what Delphix is educating customers on as well, as we also virtualize all of non-production databases, many of the platforms seen in the db-engine list, making it easier to migrate to the cloud and lessen the storage footprint. This is an area that technology should be emphasizing more. I believe, as with most monthly subscriptions, the smaller bill looks great at first, but when you add it up over the long haul, you realize how much you’re paying for the cloud and it’s essential to recognize the opportunities to save on storage, compute and resources.
The second area I want to focus on is control. Control over our data, control over what we can do with our data and control over support when something goes wrong with our data. We choose database platforms to have some control over what features and product support we have. The dynamic has shifted as many companies now look to open source to control over what they produce by creating the features and product more on their own. They feel limited by large technical companies and want more control over how much they can do and direction.
No matter what you decide, no matter what your favorite database platform is or what you use in your company, databases make the world go round. The world continues to spin and databases continue to be the mechanism to deliver the data that we use everyday to answer questions, build trends and analysis from and make the decisions that the world relies on. We Database Administrators and Developers are the ones that are lucky enough to be those to work in this exciting arena in technology.
This is a living document that I will continue to update and will add new database platforms to as I go along in my career. I spend a lot of time translating database platforms I’ve worked in for all tiers of the technical business. It just seems natural that we might need a location on the web where that information is kept.
I’ll add some diagrams at a later date, but we’ll start with a simple spreadsheet of common terms and questions and how each of the terms or tasks in Oracle, (the current #1 RDBMS) translates in other platforms. If there isn’t a similar term or task, I’ve marked not applicable, (N/A). If you have any additions or suggestions on how to improve the list, please feel free to comment or email me at DBAKevlar at Gmail.
As I’m still working on the best viewable format, keep in mind, you can export the table into a number of formats or even print it!
Oracle MSSQL MySQL SAP Hana Cassandra Instance: Start of the SGA and one or more background processes
Instance: A Windows Service and following dbs: Master, model, tempdb, msdb, resource
MySQL Instance is backgroun processes
Hana Instance, Multi-tenant does exist
Storage Engines: Innodb, ISAM, NDB, (cluster), Marta, Falcon, etc.
SGA, (System Global Area) memory allocated to Oracle
Conventional memory mgmt, AWE, pages
Hana is an in-memory database
Java Heap Memory
Query Cache, Key Cache, (storage engine can determine some of this)
Memory Pool, (allocated to in-memory)
sys.tables WHERE name = 'TransactionHistoryArchive'
Persistence Layer, (some of this)
Parition Index Summary Cache
SQL Server Windows Service, MSSQL executable process
Connection and Sesssion Manager
SQLCmd and Powershell
MySQL Workbench and mysql cli
SAP Hana Studio
TNS, (Transparent Network Substrate), Bequeath, EZConnect
ADO.net, OLEDB, ODBC, etc.
MySQL connectors (ODBC, JDBC, .NET, etc)
Performance Management Views, (i.e. V$ views)
Dynamic Management Views/functions
HANA_SQL* and HOST_* views
DB Management Memory Pool
Persistance layer for data store
Index, (one clustered index per object)
Index Management Layer
Index for partition key
Partition Key for row level
Partition key for row level
Partition key for row level
Partition Key as column level to store relevant rows
Compaction of SSTABLES
sql_trace = on
SQL Profiler and for version 2012+ Extended Events
C and ANSI SQL
Java and CQL
Database and DBOwner
Logins and Users
database backup/storage snapshot
incremental/incremental with redo only
part of transaction log, transactional commit
part of transaction log
Part of Transaction Logging and session manager
Temp Database per SQL Server
Data Actually Stores in Order no need of temp
Network /Disk Heartbeat
Part of Connection Manager
Only one heartbeat network heartbeat in messages
Master / Slave Nodes
Node Clustering, managed by Calculation Engine
Peer Nodes (no master) indeed every node act as coordinator
Shared Storage, (Voting Disk, too)
Local Storage to each node
Tokens: Data Stripes using token range at node level
Replication Factor: Data Mirrors across nodes using RF=ONE, ALL, N..
init.ora or spfile.ora
cassandra.yaml in /softwarelocation/conf/
redo log sizes & location: v$log, v$redolog
logging module manages this
redo log flush
backup transaction logs
transaction log volumes
Log located at /cassandrasoftware/clustername/nodename/logs/system.log
crsctl / srvctl
Rebalance of Data in diskgroups
nodetool repair or nodetool repair -st -et
private network or private ips
no private ip's
Microsoft Failover cluster, (MSFC) and Voting Disk
MySQL clusters, sharding
scn_to_timestamp or rowscn functions
Log Sequence Number, (LSN) to timestamp
alter session set schema
alter database set user
ALTER SYSTEM ALTER SESSION
v$asm_operation, rebalance operation
nodetool netstats or nodetool tpstats
cssd.log (disk heartbeat and network heartbeat)
size of table: bytes in dba_tables
calculate space used(total), bytes output from nodetool cfstats keyspace.tablename
number of rows in table: dba_tables.num_rows
calculate number of keys(estimate)
Size of Tablespace:sum(bytes) from dba_segments where tablespace_name=
Size of each schema- SELECT table_schema "Data Base Name", sum( data_length + index_length) / 1024 / 1024
"Data Base Size in MB" FROM information_schema.TABLES GROUP BY table_schema ;
No Undo tablespace in MSSQL
N/A dependent on storage engine, too.
Written to redo
gc_grace_seconds for holding tombstones
result cache , keep pools
memory tables, heap tables
Shutdown instance, srvctl stop instance
NET STOP MSSQLSERVER, NET STOP MSSQL$instancename
sudo /usr/local/mysql/bin/mysqld stop
rman>backup tablespace tag 'today'
backup database dbname filegroup = 'filegroup' to disk = 'path\name'
After read lock, backup table files
on each node, backup keyspace -t today
rman> backup incremental tag 'incr'
Backups database 'db' with differential
enable incremental backups
enable incremental backups
rebuild index or reorg table
rebuild index keyspace tablename idx1, idx2
rman> backup database
backup database, snapshot
logical backup at tablename
rman> recover block 57;
mysqlbinlog- use logs to create statement sql to recover transactions to PIT.
scrub [keyspace] [tablename] [-s|--skip-corrupted]
rman> recover datafile
RESTORE DATABASE adb FILEGROUP='filegroup'
Depends on storage engine- mysqldump, mysqlbackup mysqlndb, etc.
RECOVER DATA USING FILE ('
N/A SSTABLELOADER from snapshot
restricted mode: alter database restricted mode
set global read_only=1
Boot OS into single user mode
cost based statistics, (CBO or optimizer)
cost based statistics
SQL and MDX, (Multidimensional Expressions)