Sei sulla pagina 1di 5

Complex ETL Testing:

A Strategic Approach
Jeffrey R. Bocarsly, Ph.D
Director Functional Test Automation Group
2011 Real-Time Technology Solutions
(212) 240-9050 www.rtts.com info@rttsweb.com
360 Lexington Ave. Fl 9, New York, NY 10017


Data Warehouse Testing
A data warehouse is a repository of transactional data that has been extracted from original electronic sources
and transformed so that query, analysis and reporting on trends within historic data are possible and efficient.
The analyses provided by data warehouses may involve strategic planning, decision support, and monitoring
the outcomes of a chosen strategy. Typically, data that is loaded into a data warehouse is derived from diverse
sources of operational data, which may consist of data from databases, feeds, application files or flat files. The
data must be extracted from these diverse sources, transformed to a common format, and loaded into the
data warehouse. Typically, it is further aggregated into a data mart for efficient reporting. The ETL (Extract,
transform and load) process is a critical step in any data warehouse implementation, and continues to be an
area of major significance whenever the ETL code is updated. Once the data warehouse and data marts are
populated, business intelligence applications facilitate querying, analysis and reporting. The business
intelligence tools may provide simple presentations of data based on queries, or may support sophisticated
statistical analysis options. Data warehouses may have multiple front-end applications, depending on the
needs of the user community. Figure 1 shows a simplified data warehouse scheme.
An effective data warehouse testing strategy focuses on the main
structures within the data warehouse architecture:
1. The ETL layer
2. The data warehouse itself
3. Associated data marts
4. The front-end business intelligence/reporting applications
Each of these units must be treated separately and in combination, and
since there may be multiple components in each (multiple feeds to ETL,
multiple databases or data repositories that constitute the warehouse,
multiple data marts, and multiple front-end applications), each of these
subsystems must be individually validated.
One issue specific to data warehousing in the Pharmaceutical industry is the variety and types of data sources.
In addition to application databases with varied data models, data sources may utilize Pharma-specific
information exchange formats (e.g., HL7, CDISC). Organizations may have multiple proprietary internal data
formats, which may have been acquired in the process of industry consolidation. Thus, pharmaceutical data
warehouses may have to support a broader range of input formats to their ETL layers than data warehouses in
other industries.



2011 Real-Time Technology Solutions
(212) 240-9050 www.rtts.com info@rttsweb.com
360 Lexington Ave. Fl 9, New York, NY 10017


Data Verification
The recommended pre-deployment strategy is to build test automation (both functional and performance) for
every test entry point in the system (feeds, databases, internal messaging, front-end transactions). The goal
of the strategy is to provide automated tools for rapid localization of issues between test entry points (see
Figure 1).







Figure 1. A simplified Data Warehouse scheme
In Figure 1, data from a variety of sources is transformed by the ETL into the Data Warehouse. A second ETL
leg aggregates data from the Data Warehouse into Datamart tables for efficient reporting. Front-end
applications and Business Intelligence applications access the Datamart in order to provide historical and
statistical analyses of company data.












Figure 2. Data comparison points in the Data Warehouse scheme
As suggested by Figure 1, there are four major points of data comparison in the scheme. These are indicated
in Figure 2 and are found between the:
Source(s) and Data Warehouse
Data Warehouse and Datamart
Datamart and BI Tool
Source(s) and Datamart



2011 Real-Time Technology Solutions
(212) 240-9050 www.rtts.com info@rttsweb.com
360 Lexington Ave. Fl 9, New York, NY 10017
HL7
CDISC
XML
File
ETL
ETL
ETL
ETL
ETL
Data
Warehouse
Data
Mart(s)
BI Tools
ETL
Data Comparison
Data Comparison
Data Comparison
HL7
CDISC
XML
File
ETL
ETL
ETL
ETL
ETL
Data
Warehouse
Data
Mart(s)
ETL
BI Tools
Data Comparison



In automated Data Warehouse testing, the emphasis is the validation of data integrity between all points of
comparison to ensure the proper implementation of the ETL mappings and transformations across the
architecture. Using test automation, data can be tracked from the source layer, through the ETL processing
and through the Data Warehouse and Datamart components, to the front-end applications. If corrupt data is
found in a front-end application, the execution of automated tests can quickly determine whether the problem
is located in the data source, an ETL process, in a data warehouse database, a datamart or in the front-
end/Business Intelligence tool. Rapid determination of the application tier in which an issue is located can
dramatically lower turnaround times for remediation, as well as enhance quality.

Performance and Scalability








Figure 3. RTTS Data Warehouse Post-Deployment Testing Strategy
On the performance side, the test entry points analogous to the data comparison points are used, to focus on
characterizing subsystem response under load. These are indicated in Figure 3. Using industry standard
performance tools, the performance of each of the components in the ETL processes can be evaluated. Often,
a major focus of performance and scalability testing is the Business Intelligence tool client interfaces because
these are the user-facing components of the Data Warehouse architecture. Perceived poor performance by the
user community is frequently a significant concern, and performance measurements across a series of releases
can often reduce project tensions around this issue.

Overall Strategy
The emphasis on rapid localization of either data or performance problems in complex data warehouse
architectures provides a key tool for:
Promoting efficiencies in the development cycle
Shortening build cycles and meeting release targets
Delivering high quality Data Warehouse architectures.



HL7
CDISC
XML
File
ETL
ETL
ETL
ETL
ETL
Data
Warehouse
Data
Mart(s)
BI Tools
ETL
Performance
Performance
Performance
2011 Real-Time Technology Solutions
(212) 240-9050 www.rtts.com info@rttsweb.com
360 Lexington Ave. Fl 9, New York, NY 10017


Bibliography
1. Inmon, W. H., Building the Data Warehouse, John Wiley & Sons; 3rd edition, 2002
2. Kimball, R.; Reeves, L.; Ross, M.; Thornthwaite, W., The Data Warehouse Lifecycle Toolkit : Expert
Methods for Designing, Developing, and Deploying Data Warehouses, John Wiley & Sons, 1998
About the Author
Jeffrey R. Bocarsly, Ph.D., a Vice President and functional testing division manager with RTTS, has
implemented many automated software testing projects at Fortune 500 firms. His experience includes projects
in various sectors including brokerage, utility, media, pharmaceutical, software, banking, and insurance.
Jeff is a co-author of Software Test Engineering with IBM Rational Functional Tester: The Definitive
Resource, IBM Press, 2010. He holds a BS (UCLA) and Masters and PhD degrees (Columbia University).
About RTTS
RTTS offers the most comprehensive suite of quality assurance services to help
organizations drive positive results from their critical software projects.
Headquartered in New York, NY, our expert team has worked closely with over 400
clients to improve their testing processes, tool knowledge, and application
deployment outcomes. RTTS was founded in 1996, and has forged partnerships
with the worlds leading test tool vendors. Our satellite locations are in
Philadelphia, Atlanta, and Phoenix, and many of our consulting and education
services are offered through the cloud. No matter where you are, RTTS will ensure
application functionality, performance, scalability, and security for your
organization.
About QuerySurge
RTTS team of test experts developed QuerySurge to address the unique testing
needs in the data warehousing space. It has been implemented on projects
ranging from data warehousing and ETL processes to data migrations and
database upgrades. QuerySurge can verify as much as 100% of all data from
source systems, through the ETL process, to the target data warehouse. The tool
has increased test coverage and reduced test cycle time for several organizations,
helping them to mitigate risk and meet business requirements.
QuerySurge is offered exclusively by RTTS, as are any accompanying Data Warehouse Testing services. If
you are interested in learning more, or to schedule a private demonstration, please contact us by e-mail here:
info@rttsweb.com. If you would like to speak with our Sales team, please call (212) 240-9050.

2011 Real-Time Technology Solutions
(212) 240-9050 www.rtts.com info@rttsweb.com
360 Lexington Ave. Fl 9, New York, NY 10017

Potrebbero piacerti anche