What is Data Masking?
Data masking is the process of replacing sensitive data with modified, realistic values. It’s also known as data obfuscation.
Both small and large enterprises use it to protect personal information, including social security numbers, bank accounts, or credit card numbers. Data masking doesn’t merely conceal data with asterisks or blanks. Instead, a practical alternative is given and rendered in the same format. The factitious content is valuable for training, demonstrations, or testing purposes.
How it Works
Data masking, or data scrubbing, uses various algorithms to hide and replace personal information. The displayed values retain their initial properties, keeping the information structurally identical.
There are many different types or methods of data masking. Each approach works to secure sensitive information, however, their way of proceeding differs. The most common method is static data masking.
- Static Data Masking – Data obfuscation is implemented on a duplicate of the golden database. The copy appears identical to its original, except the sensitive data is altered. In most cases, this method involves loading the copy into a separate environment, removing unnecessary data (a technique called “subsetting”), and then masking data while it is in stasis. The masked version can then be pushed to a target environment.
- Dynamic Data Masking – Hence the name, the masking of information takes place in runtime or dynamically. In this instance, there isn’t a need for a second data source. Sensitive data is only accessible by authorized users, while unauthorized users see fake values. This method is typically used to apply role-based security for applications or databases. It’s also only available in read-only scenarios.
- On-the-Fly Data Masking – Masking sensitive data occurs when information transfers between environments, such as from production to test. This process is ideal for organizations that perform continuous deployments or have heavily integrated applications, as it can be challenging to keep a constant backup of masked data.
- Deterministic Data Masking – This refers to mapping two sets of data, both of which include the same type of information. Masking is applied in a way that column data always replaces the same value. For example, the name ‘Sarah Johnson’ is always replaced with ‘Karen Smith’, anytime it appears in the database.
- Statistical Data Obfuscation – In this data masking instance, the production data can hold different figures, also referred to as statistics. The technique of masquerading these statistics is called statistical data obfuscation. It allows you to share information about patterns in a data set without revealing its actual information.
Organizations can protect sensitive data using a variety of techniques or tools. One of the most common and effective ways to apply data masking is through substitution.
- Substitution – Protecting sensitive information is achieved by substituting out the authentic data with fake but realistic values. It’s noted as one of the most effective solutions, as it preserves the original look and feel of the data. In this technique, a non-authentic user wouldn’t question the validity of the information.
- Shuffling – Another popular way to apply data masking is by shuffling values within the columns. It’s a technique similar to substitution, but the values are switched using the same column of data that is being masked. Shuffling is not ideal for high-profile data, as the algorithm can be reverse-engineered.
- Scrambling – As the name suggests, scrambling involves rearranging the order of characters and numbers. It’s a simple security technique that can aid in data protection. However, deciphering the original value is still possible depending on if an individual knows the authentic pattern.
- Number and Data Variance – The numeric variance technique is practiced when masking financial and transactional information. In this technique, the displayed information reflects the values between a defined range. If the variance applied is between +/- 10% or +/- 120 days then it is often deemed to be a meaningful data set.
- Encryption – One of the most complex masking techniques is encryption. Viewing encrypted data requires the user to produce a key, also known as an encryption key. The data remains protected as long as only authorized users can access the key.
- Nulling Out (Deletion) – Applying a null value is one of the simplest forms of data masking. Nulling prevents the visibility of the data element. However, this practice reduces data integrity and can be a challenge during testing or development.
Best Practice Strategies
The first step in securing and managing sensitive data is identifying what information needs to be protected. This also includes knowing where the data resides, what applications use it, and which users are authorized to access it.
Adhering to compliance regulations, organizations should always mask the following data fields:
- Personally identifiable information (PII)
- Payment card information (PCI)
- Protected health information (PHI)
- Intellectual property (IP)
Once an organization has this established, you’ll want to ensure your protection policy meets the following best practices:
- Preserves referential integrity – With large enterprises, implementing a single data masking method isn’t always practical. However, the various techniques applied must remain in sync. Be sure to consistently apply the same technique to the same type of data, ensuring its integrity.
- Irreversible – Effective masking techniques will always prohibit reverse engineering. Masked data should remain altered and be unable to change back to its original state.
- Returns realistic values – Replacing actual data with falsified values helps to ensure testing and development can proceed with accuracy. It also adds a layer of security, given if the masked data is compromised, it wouldn’t serve any value.
- Repeatable – The best data masking solutions are ones that are quick, repeatable, or even automatic. The job of securing personal information is constant. When requirements or changes to sensitive data occur, it’s ideal to have a process that is simple to implement.
- Flexible and customizable – Environments and data sources are constantly spinning up and down. An effective data obfuscation process should easily evolve and adapt to the changes of your ERP system.