Introduction
This article will discuss how to use modern data anonymization techniques and Synthetic Data to significantly reduce the privacy risk and provide excellent data quality and accuracy for Analytics.
Why are we not more secure?
We have the tools, and budget is not a significant issue. Still, awareness among employees and lack of skills are the top barriers to establishing effective defenses, according to a study by (ISC)2 in 2021. How can we better enable this dialog?
Enable Dialog between different stakeholders
A responsible approach to Data Privacy and Security can be based on a common language and framework that can enable dialog between different stakeholders and the different teams involved. This Dialog between the various stakeholders can help find the right balance between risks with breaches and compliance to Regulations. It can also help find new business opportunities that require secure sharing of sensitive data. These are examples of frameworks that can enable this dialog:
- NIST Cybersecurity Framework guides how can manage cybersecurity risk and specifically Ransomware risks in the Cybersecurity Framework Profile for Ransomware Risk Management, Draft NISTIR 8374,
- NIST’s Cybersecurity Maturity Model Certification (CMMC) program is increasingly becoming important for a broader area of organizations. It is now aligning closer to NIST SP 800-171.
Assume that an attacker is already inside
Some of them took Data Privacy and Security more seriously than before after security-related incidents. They learned to build short- and long-term plans for better security. Essential core components for Data Privacy and Security can be applied to new use cases and platforms when a new need arises. Some of them learned before they experienced any major security incident from other companies in their industry how to implement best practices with granular Data Privacy and Security to achieve a defendable security posture. They assumed that an attacker might soon target their business or already be in their systems.
Breaches, data leaks, and Security Spending
Ransomware attacks
In early 2021 ransomware struck” COLONIAL PIPELINE, QUANTA, NATIONAL BASKETBALL ASSOCIATION (NBA), BRENNTAG, ACER, JBS FOODS, AXA, and other victims,” according to Illinois.touro.edu. A payment up to $50 million U.S. Dollars was claimed in several of these attacks. The healthcare sector is targeted vertical for ransomware. Ransomware attacks were seen in the first half of 2021 show an increase from 2020’s total numbers.
Healthcare data breach costs increased from an average total cost of $7.13 million in 2020 to $9.23 million in 2021, a 29.5% increase, according to [1].
The Healthcare sector may need more investments in healthcare data protection to hopefully lower the loss of sensitive data.
Is there a correlation between the number of breaches, data leaks, and the security spending budget? Entertainment, consumer products, technology (software and internet services), and the public are upper parts. Finance is in the bottom part:
The number of data leaks compared to security spending budget. Manufacturing, Retail, Healthcare, Technology (software and internet services), and Financial Services are in the upperparts.
Int the top dotted box, we have three industries spending much less than 6% on IT Security. Manufacturing companies may leak IP and other secret product information. Retail and Healthcare may have large numbers of sensitive data about their customers. Some of this data can be used for identity theft.
At the right dotted box, we have two industries spending much more than 6% on IT Security. Technology (software and internet services) companies may collect large amounts of data in cloud environments. Technology companies may have cloud platforms to implement better security configuration processes and more data security transparency.
Financial Services may collect desirable data to criminals but typically have a good security posture.
PCI DSS (Payment Card Industry Data Security Standard) can provide a good model for protecting Payment Card Data. PCI DSS could provide a model and baseline to safeguard Personally Identifiable Information (PII).Enforcement and Audit of data protection implementations may also help to address PII leakage better.
Use Cases
Reducing Risk with Financial Data
Anonymization minimized the risk of identification at a bank for credit card approval transactions. The bank reduced the privacy risk from 26 percent to 8 percent and still provided 98 percent accuracy compared to the initial Machine Learning model used in the analytics:
Anonymization is an advanced data-intensive business application, such as analytics, using differential privacy or k-anonymity. Pseudonymization is a reversible approach that can be based on encryption or tokenization.
Securing Voting Data
Homomorphic encryption in this example is used by the Microsoft Election guard software that provides confidentiality and integrity for Election voting data.
Protecting Medical Data in Cloud
Due to legal regulations, a medical center that owns patients’ health records cannot outsource its data to a cloud that is vulnerable to attacks. One way to overcome the confidentiality problem is to encrypt data on the local premises before outsourcing it to the cloud.
What is needed to stay vigilant?
Start small with easy Data Protection initiatives
Risk Management is a good starting point before taking the following steps and implementing processes and tools. It is good to have knowledge of the Threat Landscape and to review Privacy regulations that then can define our Privacy regulations with rules that we need to implement and enforce via Security Controls.
One implementation approach is to start small with easy Data Protection initiatives for your organization’s most urgent use cases and data.
- Start with Data discovery to find your most sensitive data assets across different environments.
- Continue the implementation with granular access control, data masking, and granular data protection for the most sensitive data in your organization.
Finding the Right Data Protection for different use cases and data in your organization requires a good understanding of the problem you want to solve. With the right Data Protection approaches, your data can be protected before it is exposed in the cloud, mobile, and other distributed environments.
Synthetic Data
When historical data is not available or when the available information is insufficient because of lack of quality or diversity, companies rely on synthetic data to build models. The utility of synthetic data varies depending on the analyst’s degree of knowledge about a specific data environment. You can generate a random sample of any distribution such as Normal, Exponential, Chi-square, t, lognormal, and Uniform.Fitting actual data to a known distribution by generating synthetic data can be done with the Monte Carlo method generate synthetic data.
Conclusion
We discussed starting small with easy simple Data Protection initiatives that are neededto stay vigilant.Each organization needs to start with Risk Management before taking steps and implementing processes and tools based on how the Threat Landscape and Privacy regulations impact them.
Modern Anonymization techniques can significantly reduce the privacy risk and still provide excellentdata quality and accuracy for analytics.When historical data is not available or when the available information is insufficient, companies can rely on synthetic datafor analytics and test/development environments.
PCI DSS could provide a model and baseline for protecting Personally Identifiable Information (PII). Enforcement and Audit of data protection implementations may also help better address PII leakage.
Notes
- A Responsible Approach to Data Privacy and Security, https://enterpriseviewpoint.com/a-responsible-approach-to-data-privacy-and-security-what-we-learned-from-failures/.