Category: Data Masking

August 23rd, 2017 by dbakevlar

Even though my social media profile is pretty available for Twitter and Linked in, I’m significantly conservative with other personal and financial data online.  The reversal of the Internet Privacy Rule, (I’ve linked to a Fox news link, as there was so much negative news on this one…) had everyone pretty frustrated, but then we need to look at security of personal information, especially financial data and as we can see by security breaches so far in 2017, we all have reason to be concerned.

The EU has taken the opposite approach with the right to be forgotten, along with General Data Protection Regulations, (GDPR.)  Where we seem to be taking a lesser, bizarre path to security, the rest of the world is tightening it up.

For the database engineer, we are

Responsible for the data, the data access and all of the database, so help me God.

As the gatekeeper for the company’s data, security had better be high on our list and our career.  There are a lot of documents and articles telling us to protect our environment, but often when we go to the business, the high cost of these products can make them hesitate on investing in them.

My Example

I’m about to use only one of the top 15 security breaches of all time as an example, but seriously, Sony Playstation Network, this was pretty terrifying and an excellent example of why we need to think deeper about data security.

Date of Discovery: April, 2011
How many Users Impacted: 77 million PlayStation Network individual accounts were hacked.

How it went down:  The Sony Playstation Network breach is viewed as the worst gaming community data breach in history. Hackers were able to make off with 12 million unencrypted credit card numbers as part of the data they accessed. They also retrieved account users full names, passwords, e-mails, home addresses, along with their purchase history and PSN/Qriocity logins and passwords.  There was an estimated loss of $171 million in revenue while the site was down for over a month.

I know as a customer, my kids always wonder why I limit where I submit my data online.  So often companies offer me the option to pay or store my credit card information in their system and I won’t.  The above is a great example as of why I don’t.  The convenience isn’t worth the high cost of lacking security or unknown security measures.

John Linkous of elQnetworks stated, “It’s enough to make every good security person wonder, ‘If this is what it’s like at Sony, what’s it like at every other multi-national company that’s sitting on millions of user data records?'”

As it was only certain environments that weren’t protected and specific ones that didn’t involve encryption methods, it reminds those in IT security to identify and apply security controls consistently across environments and organizations.

How to Protect Data

There are some pretty clear rules of thumb when protecting data-

  • Roles, Privileges and Grants

Utilize the database and applications full security features to ensure that the least privileged access is granted to the user.  As automation and advanced features come into offer you more time to allocate towards the important topic of security, build out a strong security foundation of features to ensure you’ve protected your data to the highest degree.

  • Audit regularly

There are full auditing features to ensure compliance and verify who has what access and privileges.  You should know who has access to what, if any privileges change and if changes are made by users other than the appropriate ones.

  • Encrypt production

Use powerful encryption methods to secure your production system.  Encryption changes the data to an unreadable format until a key is submitted to return the data to its original, readable format.  Encryption can be reversed, but strong encryption methods can offer advanced security against breaches.  Auditing should also show who is accessing the data and alert upon a suspected breach.

  • Mask Non-production

Often 80% of our data is non-production copies.  Most users, stakeholders and developers may not recognize the risk to the company as they would with the production environment.  Remove the responsibility and unintentional risk by masking the data with a masking tool that contains a full auto-discovery process and templates to make it easily repeatable and dynamic.

As of 2014, Sony agreed to a preliminary $15 million settlement in a class action lawsuit over the breach, which brings the grand total to just over $186 million in loss to the Sony Playstation Network.

If you think encryption and masking products are expensive, recognize how expensive a breach is.

Posted in Data Masking, Delphix, Oracle Tagged with: ,

May 26th, 2017 by dbakevlar

I’m in sunny Phoenix this week at the Data Platforms 2017 Conference and looking forward to a break in the heat when I return to Colorado this evening.

As this event is for big data, I expected to present on how big data could benefit from virtualization, but was surprised to find that I didn’t have a lot of luck finding customers utilizing us for this reason, (yet).  As I’ve discussed in previous presentations, I was aware of what a “swiss army knife” virtualization is, resolving numerous issues, across a myriad of environments, yet often unidentified.

The Use Case

To find my use case, I went out to the web and found a great article, “The Case for Flat Files in Big Data Projects“, by Time.com interactive graphics editor, Chris Wilson.  The discussion surrounds the use of data created as part of ACA and used for another article, “How Much Money Does Your Doctor Get From Medical Companies“.  The data in the interactive graphs that are part of the article is publicly available from cms.gov and the Chris discusses the challenges created by it and how they overcame it.

Upon reading it, there was a clear explanation of why Chris’ team did what they did to consume the data in a way that complimented their skill set.  It resonated with anyone who works in any IT shop and how we end up with technical choices that we’re left to justify later on.  While observing conversations at the conference this week, I lost count of how often I accepted the fact that there wasn’t a “hadoop” shop or a “hive” shop, but everyone had a group of varied solutions that resulted in their environment and if you didn’t have it, don’t count it out-  Pig, Python, Kafka or others could show up tomorrow.

This results in a more open and accepting technical landscape, which I, a “legacy data store” technologist, was welcome.  When I explained my upcoming talk and my experience, no one turned up their nose at any technology I admitted to having an interest in.

With the use case found online, was also the data.  As part of the policies in the ACA, cms.gov site, (The Center for Medicare and Medicaid) you can get access to all of this valuable data and it can offer incredible insight into these valuable programs.  The Time article only focuses on the payments to doctors from medical companies, but the data that is collected, separated out by area and then zipped up by year, is quite extensive, but as noted by a third article, as anticipated as the data was, it was cumbersome and difficult to use.

The Requirements

I proceeded to take this use case and imagine it as part of an agile environment, with this data as part of the pipeline to produce revenue by providing information to the consumer.  What would be required and how could virtualization not only enhance what Chris Wilson’s team had built, but how could the entire pipeline benefit from Delphix’s “swiss army knife” approach?

  1.  I can’t assume this is the main data store.  These flat files are a supplement to legacy data stores.
  2. There would be a standard development environment-  development and testing would need their own environments, not just a production copy of these files, applications, etc.
  3. If it’s providing data to a consumer and data is in perpetual motion in the age of the internet, an agile development method would need to be in place, which means a short development cycle with many, small, “scrum like” development groups from different departments working on tasks.
  4. Automation and seamless deployment would assist in less human intervention and resource demands, along with more successful deployments.

Solution

There were four areas that I focused to solve and eliminate bottlenecks that I either experienced or foresaw an organization experiencing when having this data as part of their environment.

  1. Eliminate the need to have multiple copies of the files, slow and manual process to propagate files to targets for development, test, etc. with Delphix’s vFile option, this would include any applications or other non-relational database tier included in the scenario.
  2. Eliminate any legacy data stores copies and refreshes that big data was dependent from and create VDBs for all development, test and reporting.
  3. Protect all non-production environments by masking non-production databases and flat files.
  4. Containerize environments for easy deployment, delivery, testing and cloud migrations.

vFiles

Each of the files, compressed were just over 500M and uncompressed, 15-18G.  This took about over 4 minutes per file to transfer to a host and could add up to considerable space.

I used Delphix vFile to virtualize files.  This means that there is a single, “gold copy” host of the files at the Delphix engine and then there’s an NFS Mount that “projects” the file access to each target, which can be used for unique copies to as many development, test and reporting copies.

Fig. 1- Creating vFiles from dSource that flat files are sourced on.

If a refresh is required, then the parent is refreshed and an automated refresh to all the “children” can be performed.  Changes can be made at the child level and if catastrophic, Delphix would allow for easy recovery, allowing for data version control, not just code version control throughout the development and testing process.

Fig. 2- Demonstration of target vFile, showing NFS Mount, files available, (created in less than 10 seconds) and how easily disabled and proven to be “Projection” of files.

Its a pretty cool feature and one that is very valuable to the big data arena.  I heard countless stories of how often, due to lack of storage, data scientists and developers were taking subsets of data to test and then once to production, find out that their code wouldn’t complete or fail when presented against the full data.  Having the ability to have the FULL files without taking up more space for multiple environments would be incredibly beneficial and shorten development cycles.

Virtualize

Most big data shops are still dependent on legacy data stores.  These are legendary roadblocks due to their size, complexity and demands for refreshes.  I proposed that those be virtualized so that each developer could have a copy and instant refresh without storage demands to again, ease development deadline pressures and allow for full access of data towards the development success.

Protect

Most people know we mask relational databases, but did you know we have Agile Data Masking for  flat files?  If these files are going to be pushed to non-production systems, especially with as much as we’re starting to hear about GDPR, (General Data Protection Regulations) from the EU in the US now, shouldn’t we mask outside of the database?

What kind of files can be masked?

  • Multi-record
  • CSV
  • XML
  • Word
  • Excel
  • PowerPoint
  • Unstructured
  • EDI

Thats a pretty considerable and cool list.  The ability to go in and mask data from flat files is a HUGE benefit to big data folks.  Many of them were looking at file security from the permissions and encryption level, so the ability to render the data benign to risk is a fantastic improvement.

Containerize

The last step is in simple recognition that big data is complex and consists of a ton of moving parts.  To acknowledge how much is often home built, open source, consisting of legacy data stores, flat files, application and other dependent tiers, should be expected.

Fig. 3- A Container, created on-prem, then moved to the cloud and to as many environments as required for the development cycle to meet the business needs.

Delphix has the ability to create templates of our sources, (aka dSources) which is nothing more than creating a container.  In my use case enhancement, I took all of these legacy data stores, applications, (including any Ajax code) flat files and then create a template from it for simple refreshes, deployments via jenkins, Chef jobs or other DevOps automation.  The ability to then take these templates and deploy them to the cloud would make a migration from on-prem to the cloud a simpler process or from one cloud vendor to another.

Fig. 4- A look at the full scenario-  Delphix engines masking files, databases, creating containers and deploying it all on-prem and to the cloud.

The end story is that this use case could be any big data shop or start up in the world today.  So many of these companies are hindered by data and Delphix virtualization could easily let their data move at the speed of business.

I want to thank Data Platforms 2017 and all the people who were so receptive of my talk.  If you’d like access to the slide deck, it’s been uploaded to Slideshare. I had a great time in Phoenix and hope I can come back soon!

 

 

Posted in big data, Cloud, Data Masking, Oracle Tagged with: ,

August 15th, 2016 by dbakevlar

I’ve been involved in two data masking projects in my time as a database administrator.  One was to mask and secure credit card numbers and the other was to protect personally identifiable information, (PII) for a demographics company.  I remember the pain, but it was better than what could have happened if we hadn’t protected customer data….

blowup

Times have changed and now, as part of a company that has a serious market focus on data masking, my role has time allocated to research on data protection, data masking and understanding the technical requirements.

Reasons to Mask

The percentage of companies that contain data that SHOULD be masked is much higher than most would think.

Screen Shot 2016-08-15 at 12.59.05 PM

The amount of data that should be masked vs. is masked can be quite different.  There was a great study done by the Ponemon Instititue, (that says Ponemon, you Pokemon Go freaks…:)) that showed 23% of data was masked to some level and 45% of data was significantly masked by 2014.  This still left over 30% of data at risk.

The Mindset Around Securing Data

We also don’t think very clearly about how and what to protect.  We often silo our security-  The network administrators secure the network.  The server administrators secure the host, but doesn’t concern themselves with the application or the database and the DBA may be securing the database, but the application that’s accessing it, may be open to accessing data that shouldn’t be available to those involved.  We won’t even start about what George in accounting is doing.

We need to change from thinking just of disk encryption and start thinking about data encryption and application encryption with key data stores that protect all of the data-  the goal of the entire project.  It’s not like we’re going to see people running out of a building with a server, but seriously, it doesn’t just happen in the movies and people have stolen drives/jump or even print outs of spreadsheets drives with incredibly important data residing on it.

As I’ve been learning what is essential to masking data properly, along with what makes our product superior, is that it identifies potential data that should be masked, along with ongoing audits to ensure that data doesn’t become vulnerable over time.

Screen Shot 2016-08-15 at 12.30.34 PM

This can be the largest consumption of resources in any data masking project, so I was really impressed with this area of Delphix data masking.  Its really easy to use, so if you don’t understand the ins and outs to DBMS_CRYPTO or unfamiliar with the java.utilRANDOM syntax, no worries, Delphix product makes it really easy to mask data and has a centralized key store to manage everything.

Screen Shot 2016-08-15 at 11.52.53 AM

It doesn’t matter if the environment is on-premise or in the cloud.  Delphix, like a number of companies these days, understands that hybrid management is a requirement, so efficient masking and ensuring that at no point is sensitive data at risk is essential.

The Shift

How many data breaches do we need to hear about to make us all pay more attention to this?  Security topics at conferences are diminished vs. when I started to attend less than a decade ago, so I know it wasn’t that long ago it appeared to be more important to us and yet it seems to be more important of an issue.

Screen Shot 2016-08-15 at 11.47.19 AM

Research was also performed that found only 7-19% of companies actually knew where all their sensitive data was located.  That’s over 80% sensitive data vulnerable to a breach.  I don’t know about the rest of you, but upon finishing up on that little bit of research, I understood why many feel better about not knowing and why its better just to accept this and address masking needs to ensure we’re not one of the vulnerable ones.

Automated solutions to discover vulnerable data can significantly reduce risks and reduce the demands on those that often manage the data, but don’t know what the data is for.  I’ve always said that the best DBAs know the data, but how much can we really understand it and do our jobs?  It’s often the users that understand it, but may not comprehend the technical requirements to safeguard it.  Automated solutions removes that skill requirement from having to exist in human form, allowing us all to do our jobs better.  I thought it was really cool that our data masking tool considers this and takes this pressure off of us, letting the tool do the heavy lifting.

Along with a myriad of database platforms, we also know that people are bound and determined to export data to Excel, MS Access and other flat file formats resulting in more vulnerabilities that seem out of our control.  Delphix data masking tool considers this and supports many of these applications, as well.  George, the new smarty-pants in accounting wrote out his own XML pull of customers and credit card numbers?  No problem, we got you covered… 🙂

Screen Shot 2016-08-15 at 12.51.45 PM

So now, along with telling you how to automate a script to email George to change his password from “1234” in production, I can now make recommendations on how to keep him from having the ability to print out a spreadsheet with all the customer’s credit card numbers on it and leave it on the printer…:)

Happy Monday, everyone!

 

 

 

 

Posted in Data Masking, Delphix, Oracle Tagged with: ,

  • Facebook
  • Google+
  • LinkedIn
  • Twitter