Sei sulla pagina 1di 4

Test Data Management & Data

Privacy
1. Why Test Data Management (TDM):
Reliability of enterprise applications is a must for business processes and decisions.
Application development and testing are data related tasks, tedious and time
consuming processes. Quality of test data determines the outcome of testing and
ready availability of accurate and right sized data can reduce the testing time. Thus,
control the costs and improve productivity. So, a TDM strategy has to be defined for
end to end testing.
Test data requirements change from scenario to scenario and are project specific.
Definition of right sized data changes form the type of testing being done. So each
situation has to be evaluated and data requirements are to be defined. This will
reduce the time in identifying the required data from available data and also on test
cycle execution. Increasing database size in test environments to meet production
and without enough processing power as production environments is just a quick fix
to the situation; this will lead to reduced testing performance and increased
resource acquiring and maintenance costs.

1.1 Test Data Management (TDM) approach and benefits:


General process flow for TDM approach for is given below
-

Get data from production


Obfuscate the sensitive data from the production data, can be called as
privatized data
Use the Privatized data to Subset the data based on the testing requirements
Continue the testing process
After the testing process is done, compare the data in test environment with
the privatized subset
Refresh the test environment with the privatized subset for next cycles as
required

Sub-setting: It is a process of selecting the required test data based on the testing
scenario.

1.2 Benefits of TDM approach:


Well Defined TDM approach will lead to
-

Reduced testing cycle execution


Reduce the testing database size
Reduced costs of acquiring and maintenance costs for test environments
Increase the performance of testing

Ready availability of test data (by sub-setting and\or masking)


Reusable dataset for refreshing after each test cycle

2. Why Data Privacy:


Data is an asset to the organization. It should be protected as any regular assets
and should have a defined authorization process, to protect the data and prevent
data theft\loss. A loss of data may cause a business to lose business and dent in the
corporate brand in the market.
Customer data in an insurance company is a confidential data element. There are
various statutory regulations governing the protection of customer information in
US, and cost of loss of confidential data will be higher in terms of business,
regulatory fines and loss of faith of customers.
A production environment is governed by all the required access restrictions and
well maintained, but data in test environments does not follow such level of access
restrictions due to the purpose of the environment. If data is moved from production
to test environments, then the purpose of having security in production is
compromised indirectly. Data governance and security is relied upon the users of
test environments, thus increasing the threat to data security.
All the data will not be of the same value, so defining sensitive data elements will
help in defining the data to be masked and de-identify, and insure from the data
loss\theft.

2.1 What is sensitive data?


Any data, which on compromise with respect to confidentiality, integrity, and/or
availability could have a material adverse effect, can be defined as sensitive data.
The degree of sensitivity of data can be defined by the impact of data compromise.
Each process has a degree of sensitivity defined based on the data it hold, some
data may hold a lot of value to a process which may not in another process. So,
each project\process has to define these requirements. In Insurance industry, any
customer details can also be defined as sensitive data.

2.2 Commonly accepted sensitive data:


Below mentioned fields are defined as sensitive data irrespective of the
project\process. Data security of most of them are governed by statutory rules.
-

Social Security Number


Income tax Number (TIN number)
Drivers License Number
Bank Account Number
Bank Routing Number
Credit card number

3. What IBM Optim offers


Optim is an IBM tool which has functionality to do Archive, Data growth
management, Test Data Management (TDM) and Data masking.
The Optim TDM helps to
create and populate appropriately sized development and test environments
Automatically or manually edit the data in test environments to match test
cases
Easily refresh the test environments
Compare test results with original data
Data privacy features helps to
Prevent misuse of information by masking, obfuscating, and privatizing
personal information that is propagated outside production environment
Optim tool is designed to work with recent versions of most popular databases in
the market currently available like Oracle 8i/9i/10.x/ 11g, DB2 LUW 7.x/8.x/9.x, DB2
Z/os 8.x/9.x/10.1, SQL server 2005/2008, Informix Dynamic server
7.3.1/10.0/11.0/11.5, Sybase 11.x/12.x/15.x, VSAM and Teradata . Its wide
compatibility makes it popular in the market. It can be used with Seibel, Oracle ebusiness suites, JD Edwards, PeopleSoft and various popular packaged applications
also.
It offers the capability to maintain the relational integrity of the data by
maintaining the entity relationships of the data while implementing solutions for
TDM and Data masking. It allows business rules to be incorporated as Optim
relationships while creating the data model, so not just database level constraints,
but data level constraints can also be considered while creating a data model.
Predefined data is available for masking sensitive data, which can be used while
masking the data without building complex logic.

3.1 Optim Data masking features


Optim tool provides inbuilt masking functionality and reference tables for widely
accepted sensitive data types which can be used in the building data masking
rules .

Data Transformation Library functions are built in masking functions for Social
security number, Credit card numbers, Email ids, Dates, currency and character
and numeric data, which can be directly used for masking.
I.

II.

III.

IV.

V.
VI.
VII.

VIII.

TRANS SSN Function: The TRANS SSN function can be used to generate a
valid and unique U.S. Social Security Number (SSN). It can also generate
random data if the source does not have SSN values
TRANS CCN Function: The TRANS CCN function can be used to generate a
valid and unique credit card number (CCN). TRANS CCN algorithmically
generates a consistently altered CCN based on the source CCN.
TRANS EML Function: The TRANS EML function can be used to generate a
random email address. An email address consists of two parts, a user name
followed by a domain name, separated by @'. The domain names and user
names can be provided randomly
TRANS COL Function: The TRANS COL function can mask data by maintaining
the format and character type of the source data at the destination. It masks
all alphanumeric values in source column.
Age Function: To age the dates in the source file either by certain number of
years or to a specific date.
Currency function: Currency Function can be used to convert a currency
value in a source column from one currency to another.
Look up Function: This function can be used to select values from a lookup
table (or reference table) that are used to populate the destination table Hash
Lookup function can be used to select values based on the source value.
Shuffle Function: This function replaces a source value with another value
from the column that is then inserted in a destination column.

Optim provides various expressions like substring, Sequence, random, concatenate


and numeric, which can be used in combination to create simple custom masking
rules. To create a complex masking rule Optim allows us to create Column map
procedures. These options provide flexibility to build masking rules in Optim.

Potrebbero piacerti anche