Data Obfuscation
& Synthesis



Here you can edit the background of the section

Obfuscating to Save Billions

The use of data obfuscation and synthesis to mitigate risk in the app development process will save businesses billions of dollars in 2022.

To deliver the best application possible, developers and testers need the most realistic data for each use case. Using real data for development in 2021 carried with it (on average) a $4.2 million risk. That cost has increased 10% year on year since 2019 and it continues to grow.

Data Breaches are Disastrous for Business

Reputational impact notwithstanding, data breaches are crippling for dev teams in terms of time to fix, needing an average of 287 days to contain and get applications back on track. In the case of some businesses, the impact can be terminal.

The additional complexity and challenges of the last two years range from the prospect of cloud migration for businesses everywhere, and highly motivated hackers, to the risk of dev teams working from home throughout the global pandemic.

In 2022 companies remain largely unprepared to manage a data breach, let alone prevent one. Worryingly over $4 million in costs and a year of entirely wasted development time can start with something as simple as human error.

Data used in production environments, needs to look and act like real data. In 2022 it’s far too great a risk for it to be real.

Simple encryption isn’t enough of a deterrent to the sophisticated hacker. Using data obfuscation is the simple process of replacing sensitive information that appears real for production purposes, yet is entirely useless to any hacker to help mitigate that risk in 2022.

The solution is to use the best and most realistic fake data available.

What is Data Obfuscation?

Data obfuscation is a pseudonymization technique carried out in non-production environments as part of test data management (TDM). It involves masking direct identifiers in data entities to meet analysis and privacy compliance requirements. This ensures controls are applied to meet the project’s data threshold of identifiability.

This technique masks sensitive fields in collected data like organization names, geographies and addresses to generate small number cell outputs to reveal variations without revealing identifiable information about data entities.

Transformations and masking should be created and classified to reduce the need to query indirect identifiers and avoid subconsciously making inferences about an individual or organization’s true details.

100% Data Assurance for a Healthcare Organization

Learn how iData performed rapid implementation of data obfuscation to eliminate risk and reduce the cost of delivering test data at scale.

Rapid Implementation of Data Obfuscation

Data Obfuscation & Synthesis Explained

The terms “data obfuscation” and “data synthesis” are often used interchangeably.

However, they are not the same thing.

The simplest explanation is that data obfuscation involves subsetting and masking data, changing critical details to obscure the real data that needs to be replicated for testing, either in a pre-production or production environment.

Data synthesis is a sub-category of data obfuscation. Instead of masking the data for the purposes of preventing a data breach, data synthesis is the process of generating brand new synthetic or fake data, that looks and acts like real data, but contains absolutely no identifying characteristics, and immune to human error.

Data Obfuscation Using iData 2.0

iData 2.0 is an intuitive end-to-end data management tool, incorporating data quality; data transformation & migration; and test data management.

iData’s test data management solution includes data sub-setting and masking as well as synthetic data generation. Through these methods, it provides a standardized set of data that covers all possible tests, remains up-to-date, can be provisioned on-demand, and contains no sensitive data.

In the process of creating this dataset, it will also profile your production data, creating a view of what data exists and where, while exposing and visualizing any relationships that exist within your data.

The entire data quality, migration and test data management process can be accessed via a single interface, making the toolkit easy and intuitive to use.


Why Obfuscate Data?

Generating synthetic values in data entities avoids any risk of identification. By mixing up elements of a data set or creating new values based on the original data, data synthesis prevents data users from maliciously or non-maliciously identifying a natural person and committing a data breach.

Organizations must ensure their data is accurate and fit for purpose as well as private. They must balance the risk of identification and data breaches against the risk of loss of data utility and accuracy that may occur when using anonymized person-level data.

Organizations handling sensitive data also have lawful bases to which they must operate using datasets.

Industry legislation demands organizations use personal data in a manner which does not breach privacy laws. All organizations must adhere to the Common Law Duty of Confidentiality.

If information is given where a duty of confidence is expected, that information cannot be disclosed without the data subject’s consent.

Implied consent is required to process confidential data for specific primary purposes. For analysis purposes, common law is set aside when confidential data is de-identified.


Operating Under ISO Accreditations

IDS’ ideal customer includes any business that is reliant on data either because they are in a regulated industry or are seeking to create competitive advantage and drive up NPS and customer service.

Our ideal client is any business where data is critical for forecasting, or that is effected by the ever-changing data regulation. A business that acquires information that originates from multiple different sources, or a business where decision-making is informed by customer data, the sources of that data could include a CRM or HR system, website, and e-commerce platform, call centre, market data, supply chain data etc. Typically our customers work with data from a variety of different systems that need to talk to one another.

Ideally IDS engages with its customers when they are undergoing a large scale digital transformation, an intrinsic or extrinsic change. With these changes come immense resource requirements and the risk of delays can impact businesses financially. Further, IDS delivers certainty to our clients, especially where data sovereignty is business critical, and any security breach would have serious financial and reputational implications.

Our clients understand that data constantly changes and evolves and needs to be 100% assured. Our ideal customers rely on IDS to deliver data certainty throughout their digital transformation.



A Simplified Process

iData’s ‘Validation & Clean’ stage type automaps entities and obfuscates data through the ability to edit transforms. Scripts can be edited to mask the entire value with new characters or symbols, or a partial amount so the original value is still shown.

Once an obfuscation report has been run, iData presents users with the original values against the synthetic tagged values in different tables.

All data fields work together to create obfuscated values. A synthetic first and second name for a contact can be generated, leading to synthetic email addresses automatically generating to reflect the contact’s synthetic values.

This is just one of the toolkit’s many capabilities. Want to know more?