As software testers we may think that GDPR (General Data Protection Regulation) has nothing to do with us. However, the key word here is data. The problem is that to do our testing, we typically need data. You can’t do realistic end user acceptance testing without realistic data. Even before GDPR, getting good data for testing was a problem.
Especially with complex business rules whereby a location may dictate that certain rules are enacted, or when a transaction changes different data fields as it is processed, or when data becomes valid, invalid, or expired dependent on different criteria or based on results of the previous step. An example of this would be testing order processing for a retail transaction. If you don’t have a valid customer id and credit card, you obviously need to use simulated data…
Data can come from a few sources. You can use real data and then mask it, or you can use synthetic or ‘fake’ data. When you use real data and mask it, there is always the risk that your masking routines are not sufficient and integrated across all data sources. For synthesized data, or data that you make up, it may not adequately support testing complex business scenarios as noted above that require timing or step by step transactions.
When considering GDPR compliance, it’s important to understand the concept of personal data. That is, any data or information related to a person that can be used to directly or indirectly identify the person. This could include address, name, email, bank accounts, social network posts, social network ID, medical information, IP address and many other data items. So if the data you are using can be used to identify anyone, then you need to consider masking in the context of GDPR compliance..
Testing for Big Data – For any Big Data testing, data profiling is a key component of a comprehensive effort. As such, when identifying data anomalies and detailing the frequency, distribution and characteristics of the data sets, GDPR compliance can be built into the process, but you need to know what personal data characteristics to highlight.
Testing for Financial and Accounting Software – For any financial software testing, GDPR compliance should be built in with masking of any personal data related to directly identifying any particular person or entity. All account numbers should be randomized and ‘fake’ along with addresses, names, etc.
Testing for Healthcare – Any software testing in the healthcare domain must be compliant with HIPAA, meaning Protected Health Information (PHI) such as demographic information, medical history, test or lab results, insurance information, and other data that is collected to identify an individual must made anonymous when used for testing, so it’s likely that GDPR requirements are already addressed
Current Landscape of GDPR
GDPR is EU driven legislation on the basis that privacy is a fundamental human right. In the USA, we are not quite there in how we view privacy. On one hand, privacy is not even mentioned in the Constitution. On the other, we do have privacy related legislation in certain domains like healthcare (HIPPA). Currently, “privacy”, for the most part, is an agreement between an individual and the company collecting his/her data. This may change in the future as more and more Americans become concerned and aware of privacy issues, or if there is another big data breach similar to those of Target and Cambridge Analytics whereby US Citizens demand legislation from their elected representatives.
As software testers, regardless of what happens in the halls of Congress, we need to remain diligent in ensuring that the data we are using for testing purposes is anonymous. More importantly, being creative and thoughtful in developing routines and methods in producing synthetic data that can satisfy testing complex scenarios and transactions will keep you on the right side of QA best practices.