Software is a collection of computer programs and related data. But when we talk about software testing, we rarely pay enough attention to data. There are many testing methods to check the programs, black-box or white-box testing, functionality or performance testing, etc. What about the data? You may say Data-driven testing. Yes, data-driven testing does help, but we need much more than this.

In the software industry, it’s common to store the data separately, i.e. a popular solution is a third party database. But most of time the database does much more than just store data. For instance, some software use the database programming features, such as triggers, stored procedures to do background calculations. Additionally, databases can be used as an interface to communicate with the other software. Sometimes, software uses the database’s powerful utilities as parts of their own. In these cases, testing data is important, but data-driven testing can’t completely ensure data verification, because it’s almost impossible to define the expected output.

How do we test and verify data in these situations? Let’s divide data verification testing into 3 phases, each with its own purpose and method:

  1. Data collection: In this phase, programs collect data from end users and input them into the database. Most commercial software treats this data as “raw data”, to be saved in “raw tables” and never to be modified. Data-driven testing can be used in this phase, which means, we can always get the same raw data from a table using specific conditions or criteria.
  2. Data calculation: In this phase, methods are used to perform calculations on the raw data and then save the results to result tables. Sometimes these methods are stored procedures in a database, or provided by third party software. Data testing efforts in this situation are totally outside the software program and are are typically done within the database. Be careful, often times the resulting data have time stamp properties that should be considered when doing data verification. It’s a little bit complex to verify the data when third party programs are involved because the third party programs bring their own issues. In addition, some environment issues can come into play. For example, the system may call a web service to request some data for a calculation but fail because the web service is broken, or timeout due to poor network conditions. As a tip, I always ping the third party programs to make sure they are available before the data verification.
  3. Temporary data: This kind of result data is generated on demand and used only one time. For instance, many applications provide report functions, such as weekly, monthly, departmental report or company report. After finalizing the query criteria, the system will generate the temporary data, and then the reports are built based on the temporary data. Finally, reports are generated with a third party report program. For this, we must not only test the temporary data to make sure it is accurate, but we must also test the report program to ensure the data is reported correctly.

I’ve just covered a few tips in our experience regarding data verification testing. There is still much more to cover in the Extract, Transform, and Load process, but this is much deeper in scope. Perhaps in another blog we can go into more detail. Feel free to write back and let me know what you’d like to discuss.