Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Almost all the IT companies today, highly depend on data flow as a large amount of
information is made available for access and one can get everything which is required.
And this is where the concept of ETL and ETL Testing comes into the picture. Basically,
ETL is abbreviated as Extraction, Transformation, and Loading. Presently ETL Testing is
performed using SQL scripting or using spreadsheets which may be a time-consuming
and error-prone approach.
In this article, we will have detailed discussions on several concepts viz. ETL, ETL
Process, ETL testing and different approaches used for it along with the most popular
ETL testing tools.
I would also like to compare ETL Testing with Database Testing but before that let us
have a look at the types of ETL Testing with respect to database testing.
Given below are the Types of ETL Testing with respect to Database Testing:
1) Constraint Testing:
Testers should test whether the data is mapped accurately from source to destination
while checking for it testers need to focus on some key checks (constraints).
They are:
NOT NULL
UNIQUE
Primary Key
Foreign Key
Check
NULL
Default
2) Duplicate Check Testing:
Source and target tables contain a huge amount of data with frequently repeated values,
in such case testers follow some database queries to find such duplication.
3) Navigation Testing:
Navigation concerns with the GUI of an application. User finds an application friendly
when he gets easy and relevant navigation throughout the entire system. The tester
must focus on avoiding irrelevant navigation from the user point of view.
4) Initialization Testing:
Initialization Testing is performed to check the combination of hardware and software
requirements along with platform it is installed
From the above listing one may consider that ETL Testing is quite similar to Database
Testing but the fact is ETL Testing is concerned with Data Warehouse Testing and not
Database Testing.
There are several other facts due to which ETL Testing differs from Database Testing.
#1) ETL Mapping Sheets: This document contains information about the source &
destination tables and their references. Mapping sheet provides help to create big SQL
queries while performing ETL Testing.
#2) Database schema for Source and Destination table: It should be kept updated in
the mapping sheet with database schema to perform data validation.
Few ETL Testing Automation Tools are used to perform ETL Testing more effectively
and rapidly.
Key Features:
Informatica Validation tool is a comprehensive ETL Testing tool which does not
require any programming skill.
It provides automation during ETL testing which ensures if the data is delivered
correctly and is in the expected format into the destination system.
It helps to complete data validation and reconciliation in the testing and
production environment.
It reduces the risk of introducing errors during transformation and avoid bad data
to be transformed into the destination system.
Informatica Data Validation is useful in Development, Testing and Production
environment where it is necessary to validate the data integrity before moving
into the production system.
50 to 90% of cost and efforts can be saved using Informatica Data Validation tool.
Informatica Data Validation provides a complete solution for data validation along
with data integrity.
Reduces programming efforts and business risks due to an intuitive user
interface and built-in operators.
Identifies and prevents data quality issues and provides greater business
productivity.
Allows 64% free trial and 36% paid service that reduces time and cost required
for data validation.
Visit official site here: Informatica Data Validation
#2) QuerySurge
[image source]
QuerySurge tool is specifically built for testing of Big Data and Data warehouse. It
ensures that the data extracted and loaded from the source system to the destination
system is correct and is as per the expected format. Any issues or differences are
identified very quickly by QuerySurge.
Key Features:
QuerySurge is an automated tool for Big Data Testing and ETL Testing.
It improves the data quality and accelerates testing cycles.
It validates data using Query Wizard.
It saves time & cost by automating manual efforts and schedules tests for a
specific time.
QuerySurge supports ETL Testing across the various platform like IBM, Oracle,
Microsoft, SAP.
It helps to build test scenarios and test suit along with configurable reports
without specific knowledge of SQL.
It generates email reports through an automated process.
Reusable query snippet to generate reusable code.
It provides a collaborative view of data health.
QuerySurge can be integrated with HP ALM, TFS, IBM Rational Quality Manager.
Verifies, converts, and upgrades data through the ETL process.
It is a commercial tool that connects source and target data and also supports
real-time progress of test scenarios.
Visit the official site here: QuerySurge
#3) iCEDQ
[image source]
iCEDQ is an automated ETL Testing tool specifically designed for the issues faced in a
data-centric project like a data warehouse, data migration etc. iCEDQ performs
verification, validation, and reconciliation between the source and destination system. It
ensures if the data is intact after migration and it avoids bad data to load into the target
system.
Key Features:
iCEDQ is a unique ETL Testing tool which compares millions of rows of
databases or files.
It helps to identify the exact row and column which contains data issue.
It sends alerts and notifications to the subscribed users after execution.
It supports regression testing.
iCEDQ supports various databases and can read data from any database.
iCEDQ connects with a relational database, any JDBC compliant database, flat
files etc.
Based on unique columns in the database, iCEDQ compares the data in memory.
It can be integrated with HP ALM – Test Management Tool.
iCEDQ is designed for ETL Testing, Data Migration Testing and Data Quality
Verification.
Identifies data integration errors without any custom code.
Supports rule engine for ETL process, collaborative efforts and organized QA
process.
It is a commercial tool with 30 days trial and provides custom reports with alerts
and notifications.
iCEDQ Big Data Edition now uses the power of Hadoop Cluster
BI Report Testing & Dashboard Testing with iCEDQ
Visit the official site here: iCEDQ
#4) Datagaps ETL Validator
ETL Validator tool is designed for ETL Testing and Big Data Testing. It is a solution for
the data integration projects. The testing of such data integration project includes various
data types, huge volume, and various source platforms. ETL Validator helps to
overcome such challenges using automation which further helps to reduce the cost and
to minimize efforts.
ETL Validator has an inbuilt ETL engine which compares millions of records from
various databases or flat files.
ETL Validator is data testing tool specifically designed for automated data
warehouse testing.
Visual Test Case Builder with drag and drop capability.
ETL Validator has features of Query Builder which writes the test cases without
manually typing any queries.
Compare aggregate data such as count, sum, distinct count etc.
Simplifies the comparison of database schema across various environment which
includes data type, index, length, etc.
ETL Validator supports various platforms such as Hadoop, XML, Flat files etc.
It supports email notification, web reporting etc.
It can be integrated with HP ALM which results in sharing of test results across
various platforms.
ETL Validator is used to check Data Validity, Data Accuracy and also to perform
Metadata Testing.
Checks Referential Integrity, Data Integrity, Data Completeness and Data
Transformation.
It is a commercial tool with 30 days trial and requires zero custom programming
and improves business productivity.
Visit the official site here: Datagaps ETL Validator
#5) QualiDI
QualiDi is an automated testing platform which offers end to end testing and ETL
Testing. It automates ETL Testing and improves the effectiveness of ETL Testing. It also
reduces tthe esting cycle and improves the data quality. QualiDI identifies bad data and
non-compliant data very easily. QualiDI reduces regression cycle and data validation.
Key Features:
QualiDI creates automated test cases and it also provides support for automated
data comparison.
It offers data traceability and test case traceability.
It has a centralized repository for requirements, test cases, and test results.
It can be integrated with HPQC, Hadoop etc.
QualiDI identifies a defect in the early stage which in turns reduces the cost.
It supports email notifications.
It supports continuous integration process.
It supports Agile development and rapid delivery of sprints.
QualiDI manages complex BI Testing cycle, eliminates human error and data
quality maintained.
Visit the official site: QualiDi
#6) Talend Open Studio for Data Integration
Talend Open Studio for Data Integration is an open source tool which makes ETL
Testing easier. It includes all ETL Testing functionality and additional continuous delivery
mechanism. With the help of Talend Data Integration tool, a user can run the ETL jobs
on the remote servers that too with a variety of operating system.
ETL Testing ensures that data is transformed from the source system to the target
without any data loss and thereby adhering to transformation rules.
Key Features:
Talend Data Integration supports any types of relational database, Flat files etc.
Integrated GUI which simplifies design and developing of ETL processes.
Talend Data Integration has inbuilt data connectors with more than 900
components.
It detects business ambiguity and inconsistency in transformation rule quickly.
It supports remote job execution.
Identifies defects at an early stage to reduce cost.
It provides quantitative and qualitative metrics based on the ETL best practices.
Context switching is possible between
ETL development, ETL testing, and ETL production environment.
Real-time data flow tracking along with the detailed execution statistics.
Visit the official site here: Talend ETL Testing
#7) Codoid's ETL Testing Services
Codoid’s ETL and data warehouse testing service includes data migration and data
validation from the source to the target system. ETL Testing ensures that there is no
data error, no bad data or data loss while loading data from the source to the target
system. It quickly identifies any data errors or any other general errors occurred during
the ETL process.
Key Features:
Codoid’s ETL Testing service ensures data quality in the data warehouse and
data completeness validation from the source to the target system.
ETL Testing and data validation ensure that the business information transformed
from source to target system is accurate and reliable.
The automated testing process performs data validation during and post data
migration and prevents any data corruption.
Data validation includes count, aggregates and spot checks between the target
and actual data.
Automated testing process verifies if data type, data length, indexes are
accurately transformed and loaded into the target system.
Data quality Testing prevents data errors, bad data or any syntax issues.
Visit the official site here: Codoid’s ETL Testing
#8) Data-Centric Testing
Data Centric testing tool performs robust data validation to avoid any glitches such as
data loss or data inconsistency during data transformation. It compares data between
systems and ensures that the data loaded into the target system is exactly matching with
the source system in terms of data volume, data type, format, etc.
Key Features:
Data Centric Testing is build to perform ETL Testing and Data warehouse testing.
Data Centric Testing is the largest and the oldest testing practice.
It offers ETL Testing, data migration and reconciliation.
It supports various relational database, Flat files etc.
Efficient Data validation with 100% data coverage.
Data Centric Testing also support comprehensive reporting.
The automated process of data validation generates SQL queries which result in
the reduction of cost and efforts.
It offers a comparison between heterogeneous databases like Oracle & SQL
Server and ensures that the data in both systems is in the correct format.
Visit the official site here: Data-Centric Testing
#9) SSISTester
SSISTester is a framework which helps in the unit and integration testing of SSIS
packages. It also helps to create ETL processes in a test-driven environment which
thereby helps to identify errors in the development process. There are a number of
packages created while implementing ETL processes and these needs to be tested
during unit testing. Integration test is also “Live tests”.
Key Features:
Unit test creates and verify tests and once execution gets complete it performs
clean-up job.
Integration test verifies that all packages are satisfied post execution of the unit
test.
Tests are created in a simple way as the user creates it in Visual Studio.
Real-time debugging of a test is possible using SSISTester.
Monitoring of test execution with user-friendly GUI.
Test results are exported in HTML format.
It removes external dependencies by using fake source and destination
addresses.
For the creation of tests, it supports any .NET language.
Visit the official site here: SSISTester
#10) TestBench
TestBench is a database management and verification tool. It is a unique solution which
addresses all issues related to the database. User managed data rollback improve
testing productivity and accuracy. It also helps to reduce environment downtime.
TestBench reports all inserted, updated and deleted transactions which are performed in
a test environment and captures the status of the data before and after the transaction.
Key Features:
It always maintains data confidentiality to protect data.
It has a restoration point for an application when a user wants to return back to a
specific point.
It improves decision making knowledge.
It customizes data sets to improve test efficiency.
It helps for maximum tests coverage and helps to reduce time and money.
Data privacy rule ensures that the live data is not available in the test
environment.
Results are compared with various databases. Results include differences in
tables & operation performed on tables.
TestBench analyzes the relationship between the tables and maintains the
referential integrity between tables.
Visit the official site here: TestBench
Some more to the list:
#11) GTL QAceGen
QAceGen is specifically designed to generate complex test data, automate ETL
regression suite and to validate business logic of applications. QAceGen generates test
data based on the business rule which is defined in the ETL specification. It creates
each scenario which includes data generation and data validation statement.
This tutorial will present you with a complete idea about ETL testing and what we
do to test ETL process.
Complete List Tutorials in this series:
Tutorial #1: ETL Testing Data Warehouse Testing Introduction guide
Tutorial #2: ETL Testing Using Informatica PowerCenter Tool
Tutorial #3: ETL vs. DB Testing
Tutorial #4: Business Intelligence (BI) Testing: How to Test Business Data
Tutorial #5: Top 10 ETL Testing Tools
It has been observed that Independent Verification and Validation is gaining huge
market potential and many companies are now seeing this as prospective business gain.
Customers have been offered a different range of products in terms of service offerings,
distributed in many areas based on technology, process, and solutions. ETL or data
warehouse is one of the offerings which are developing rapidly and successfully.
Through ETL process, data is fetched from the source systems, transformed as per
business rules and finally loaded to the target system (data warehouse). A data
warehouse is an enterprise-wide store which contains integrated data that aids in the
business decision-making process. It is a part of business intelligence.
Having said that data is most important part of any organization, it may be everyday data
or historical data. Data is the backbone of any report and reports are the baseline on
which all the vital management decisions are taken.
Most of the companies are taking a step forward for constructing their data warehouse to
store and monitor real-time data as well as historical data. Crafting an efficient data
warehouse is not an easy job. Many organizations have distributed departments with
different applications running on distributed technology.
ETL tool is employed in order to make a flawless integration between different data
sources from different departments. ETL tool will work as an integrator, extracting data
from different sources; transforming it into the preferred format based on the business
transformation rules and loading it in cohesive DB known are Data Warehouse.
Well planned, well defined and effective testing scope guarantees smooth
conversion of the project to the production. A business gains the real buoyancy once
the ETL processes are verified and validated by an independent group of experts to
make sure that data warehouse is concrete and robust.
ETL or Data warehouse testing is categorized into four different
engagements irrespective of technology or ETL tools used:
New Data Warehouse Testing – New DW is built and verified from scratch. Data
input is taken from customer requirements and different data sources and new
data warehouse is built and verified with the help of ETL tools.
Migration Testing – In this type of project customer will have an existing DW and
ETL performing the job but they are looking to bag new tool in order to improve
efficiency.
Change Request – In this type of project new data is added from different
sources to an existing DW. Also, there might be a condition where customer
needs to change their existing business rule or they might integrate the new rule.
Report Testing – Report is the end result of any Data Warehouse and the basic
propose for which DW builds. The report must be tested by validating layout, data
in the report and calculation.
ETL Process
(Note: Click on the image for enlarged view)
Requirement understanding
Validating
Test Estimation based on a number of tables, the complexity of rules, data
volume and performance of a job.
Test planning based on the inputs from test estimation and business requirement.
We need to identify here that what is in scope and what is out of scope. We also
look out for dependencies, risks and mitigation plans in this phase.
Designing test cases and test scenarios from all the available inputs. We also
need to design mapping document and SQL scripts.
Once all the test cases are ready and are approved, testing team proceed to
perform pre-execution check and test data preparation for testing
Lastly, execution is performed till exit criteria are met. So, execution phase
includes running ETL jobs, monitoring job runs, SQL script execution, defect
logging, defect retesting and regression testing.
Upon successful completion, a summary report is prepared and closure process
is done. In this phase, sign off is given to promote the job or code to the next
phase.
The first two phases i.e. requirement understanding and validation can be regarded as
pre-steps of ETL test process.
Below is the list of objects that are treated as essential for validation in this
testing:
Verify that data transformation from source to destination works as expected
Verify that expected data is added to the target system
Verify that all DB fields and field data is loaded without any truncation
Verify data checksum for record count match
Verify that for rejected data proper error logs are generated with all details
Verify NULL value fields
Verify that duplicate data is not loaded
Verify data integrity
ETL Testing Challenges
This testing is quite different from conventional testing. There are many challenges we
faced while performing data warehouse testing.
Hope these tips will help ensure your ETL process is accurate and the data warehouse
build by this is a competitive advantage for your business.
ETL Vs. DB Testing – A Closer Look At ETL
Testing Need, Planning And ETL Tools
Software testing has the variety of areas to be concentrated. Major varieties are
functional and non-functional testing. Functional testing is the procedural way to ensure
so that the functionality developed works as expected. Non-functional testing is the
approach by which the non-functional aspects like enhanced or performance at an
acceptable level can be ensured.
There is another flavour of testing called DB testing. Data is organized in the database
in the form of tables. For business, there can be flows where the data from the multiple
tables can be merged or processed on to a single table and vice versa.
ETL testing is one another kind of testing that is preferred in the business case where a
kind of reporting need is sought by the clients. The reporting is sought in order to
analyze the demands, needs and the supply so that clients, business and the end users
are very well served and benefited.
What will you learn in this tutorial?
In this tutorial, you will learn what is database testing, what is ETL testing, a difference
between DB testing and ETL testing, and more details about ETL testing need, process,
and planning with real examples.
We have also covered ETL testing in more details on below page. Also, have a look at it.
DB testing:
DB Testing is usually used extensively in the business flows where there are multiple
data flows occurring in the application from multiple data sources on to a single table.
The data source can be a table, flat file, application or anything else that can yield some
output data. In turn, the output data obtained can still be used as input for the sequential
business flow. Hence when we perform DB testing the most important thing that has to
be captured is the way the data can get transformed from the source along with how it
gets saved in the destination location.
Synchronization is one major and the essential thing that has to be considered when
performing the DB testing. Due to the positioning of the application in the architectural
flow, there might be few issues with the data or DB synchronization. Hence while
performing the testing, this has to be taken care as this can overcome the
potential invalid defects or bugs.
Example #1:
Project “A” has integrated architecture where the particular application makes use of
data from several other heterogeneous data sources. Hence the integrity of these data
with the destination location has to be done along with the validations for the following:
Primary foreign key validation
Column values integrity
Null values for any columns
What is ETL Testing?
ETL testing is a special type of testing that the client wants to have it done for their
forecasting and analysis of their business. This is mostly used for the reporting
purposes. For instance, if the clients need to have reported on the customers who use or
go for their product based on the day they purchase, they have to make use of the ETL
reports.
Example #2:
We will consider a group “A” doing retail customer business through a shopping market
where the customer can purchase any household items required for their day to day
survival. Here all the customers visiting are provided with a unique membership id with
which they can gain points every time they come to purchase things from the shopping
market. The regulations provided by the group say that the points gained expire every
year. And depending upon their usage, the membership can be either upgraded to a
higher grade member or downgraded to a lower grade member comparatively to the
current grade. After 5 years of shopping market establishment now management is
looking for scaling up their business along with revenue.
Hence they required few business reports so that they can promote their customers.
Following is a tabular form that describes the basic behaviour of both the testing formats.
Applicable In the functional system where the External to the business flow environment
place business flow occurs is the historical business data
Business Severe impacts can lead as it is the Potential impacts as in when the clients w
impact integrated architecture of the business have the forecasting and analysis to be do
flows
Data Nature Normalized data is being used here Denormalized data is being used here
The most significant thing in ETL is about identifying the essential data and the tables
from the source. The next essential step is the mapping of tables from source to the ETL
environment.
Following is an example how the mapping between the tables from the various
environments can be related to the ETL purpose.
The above mapping assumes the data from the source table to the staging table. And
from then on to the tables in EDW and then to OLAP which is the final reporting
environment. Hence at any point of time, data synchronization is very important for the
ETL sake.
Critical ETL Needs
As we understand ETL is the need for forecasting, reporting and analysing of the
business in order to capture the customer needs in a more successive manner. This will
enable the business to have higher demands than the past.
Here are few of the critical needs without which ETL testing cannot be achieved:
1. Data and tables identification – This is important as there can be many other
irrelevant and unnecessary data that can be of least importance when forecasting
and analyse the customer needs. Hence the relevant data and the tables have to
be selected before starting up the ETL works.
2. Mapping sheet – This is one of the critical needs while doing ETL works.
Mapping of the right table from the source to the destination is mandatory and
any problems or incorrect data in this sheet might impact the whole ETL
deliverable.
3. Table designs and data, column type – This is next major step when
considering the mapping of source tables into the destined tables. The column
type has to match with the tables at both the places etc.
4. Database access – Main thing is the access to the database where ETL goes
on. Any restrictions on the access will have an equivalent impact.
Example #3:
A company which manufactures silk fabric wanted to analyse on their annual sales. On
review of their annual sales, they found during the month of August and September
there was tremendous fall in sales with the use of the report they generated. Hence they
decided to roll out the promotional offer like the exchange, discounts etc., that enhanced
their sales.
Basic Issues In ETL Testing
There can be a number of issues while performing ETL testing like the following:
1. Either the access to the source tables or the views will not be valid.
2. The column name and the data type from the source to the next layer might not
match.
3. A number of records from the source table to the destined tabled might not
match.
And there might be much more..
Following is a sample of mapping sheet where there are columns like VIEW_NAME,
COLUMN_NAME, DATA_TYPE, TABLE_NAME, COLUMN_NAME, DATA_TYPE, and
TRANSFORMATION LOGIC present.
The first 3 columns represent the details of the source database and the next 3 are the
details for the immediate preceding database. The last column is very important.
Transformation logic is the way the data from the source is read and stored in the
destined database. This depends on the business and the ETL needs.
Following are few of the points to be taken care while ETL test planning and
execution:
#1: Data is being extracted from the heterogeneous data sources
#2: ETL process handling in the integrated environment that have different:
DBMS
OS
Hardware
Communication protocols
#3: Necessity in having a logical data mapping sheet before the physical data can be
transformed
#4: Understanding and examining of the data sources
#5: Initial load and the incremental load
#6: Audit columns
#7: Loading the facts and the dimensions
ETL Tools And Their Significant Usage
ETL tools are basically used to build and convert the transformation logic by taking
data from the source into another applying the transformation logic. You can also map
the schemas from the source to the destination which occurs in unique ways, transform
and clean up data before it can be moved to the destination, along with loading at the
destination in an efficient manner.
This can significantly reduce the manual efforts as the mapping can be done that is used
for almost all of the ETL validation and the verification.
ETL tools:
1. Informatica – PowerCenter – is one of the popular ETL tools that is introduced
by the Informatica Corporation. This has the very good customer base covering
wide areas. The major components of the tool are its tools for clients and the
repository tools and the servers. To know more about the tool please click here
2. IBM – Infosphere Information Server – IBM who is the market leader in terms
of Computer technology has developed the Infosphere Information server that is
used for the Information Integration and Management in the year 2008. To know
more about the tool please click here
3. Oracle – Data Integrator – Oracle Corporation has developed their ETL tool in
the name of Oracle – Data Integrator. Their increasing customer support has
made them update their ETL tools in various versions. To know more about the
tool please click here
More examples of the usage of ETL testing:
Considering some Airlines which want to roll out promotions and offers to attract the
customers strategically. Firstly they will try to understand the demands and needs with
the customer's specifications. In order to achieve this, they will require the historical data
preferably the previous 2 years data. Using the data they will analyze and prepare some
reports that will be helpful in understanding the customers’ needs.
The reports can be of following kind:
Analyzing these reports will help the clients in identifying the kind of promotions and
offers that will benefit the customers and at the same time can benefit business where
this can become a Win-Win situation. This can be easily achieved by the ETL testing
and reports.
In parallel, the IT segment faces serious DB issue that has been noticed that has
stopped multiple services, in turn, has the potential to cause impacts in the business. On
investigation, it was identified that some invalid data has corrupted few databases that
needed to be corrected manually.
In the former case, it is ETL reports and testing that will be required.
Whereas the latter case is where the DB testing has to be done properly to overcome
issues with invalid data.
Conclusion:
Hope the above tutorial has provided a simple and clear overview of what ETL testing is
and why it has to be done along with the business impacts or benefits they yield. This
does not stop here, but it can extend to set foresight in growth in business.
Let me take you through a tour on how to perform ETL testing specific to Informatica.
The main aspects which should be essentially covered in Informatica ETL testing
are:
Testing the functionality of Informatica workflow and its components; all the
transformations used in the underlying mappings.
To check the data completeness (i.e. ensuring if the projected data is getting
loaded to the target without any truncation and data loss),
Verifying if the data is getting loaded to the target within estimated time limits (i.e.
evaluating performance of the workflow),
Ensuring that the workflow does not allow any invalid or unwanted data to be
loaded in the target.
Classification Of ETL Testing In Informatica:
For better understanding and ease of the tester, ETL testing in Informatica can be
divided into two main parts –
Coming to the next part i.e. detailed testing in Informatica, you will be going in depth
to validate if the logic implemented in Informatica is working as expected in terms of its
results and performance.
You need to do the output data validations at the field level which will confirm that
each transformation is operating fine
Verify if the record count at each level of processing and finally if the target is as
expected.
Monitor thoroughly elements like source qualifier and target in source/target
statistics of session
Ensure that the run duration of the Informatica workflow is at par with the
estimated run time.
To sum up, we can say that the detailed testing includes a rigorous end to end validation
of Informatica workflow and the related flow of data.
Based on my requirements stated above, my database table (Target) should look like
this:
Now, say, we have developed an Informatica workflow to get the solution for my ETL
requirements.
The underlying Informatica mapping will read data from the flat file, pass the data
through a router transformation that will discard rows which either have product category
as ‘C’ or expiry date, then I will be using a sequence generate to create the unique
primary key values for Prod_ID column in Product Table.
Finally, the records will be loaded to Product table which is the target for my Informatica
mapping.
Examples:
Below are the sample test cases for the scenario explained above.
You can use these test cases as a template in your Informatica testing project and
add/remove similar test cases depending upon the functionality of your workflow.
Tester Comments:
#2) Test Case ID: T002
Test Case Purpose: To ensure if the workflow is running successfully
Test Procedure:
Go to workflow manager
Open workflow
Right click in workflow designer and select Start workflow
Check status in Workflow Monitor
Input Value/Test Data: Same as test data for T001
Expected Results: Message in the output window in Workflow manager: Task Update:
[workflow_name] (Succeeded)
Actual Results: Message in the output window in Workflow manager: Task Update:
[workflow_name] (Succeeded)
Remarks: Pass
Tester Comments: Workflow succeeded
Note: You can easily see the workflow run status (failed/succeeded) in Workflow monitor
as shown in below example. Once the workflow will be completed, the status will reflect
automatically in workflow monitor.
In the above screenshot, you can see the start time and end time of workflow as well as
the status as succeeded.
Actual Results:
1 row returned.
Prod_ID
Product_name Prod_description Prod_category Prod_expiry_date Prod
(Primary Key)
Remarks: Pass
Tester Comments: Considering the test as ‘Pass’ in case the actual run duration is +/-
10% of expected run duration.
Benefits Of Using Informatica As An ETL Tool:
Informatica is a popular and successful ETL tool because:
As I mentioned earlier, you can add/remove/modify these test cases depending on the
scenario you have in your project.
There is another flavour of testing called DB testing. Data is organized in the database
in the form of tables. For business, there can be flows where the data from the multiple
tables can be merged or processed on to a single table and vice versa.
ETL testing is one another kind of testing that is preferred in the business case where a
kind of reporting need is sought by the clients. The reporting is sought in order to
analyze the demands, needs and the supply so that clients, business and the end users
are very well served and benefited.
What will you learn in this tutorial?
In this tutorial, you will learn what is database testing, what is ETL testing, a difference
between DB testing and ETL testing, and more details about ETL testing need, process,
and planning with real examples.
We have also covered ETL testing in more details on below page. Also, have a look at it.
DB testing:
DB Testing is usually used extensively in the business flows where there are multiple
data flows occurring in the application from multiple data sources on to a single table.
The data source can be a table, flat file, application or anything else that can yield some
output data. In turn, the output data obtained can still be used as input for the sequential
business flow. Hence when we perform DB testing the most important thing that has to
be captured is the way the data can get transformed from the source along with how it
gets saved in the destination location.
Synchronization is one major and the essential thing that has to be considered when
performing the DB testing. Due to the positioning of the application in the architectural
flow, there might be few issues with the data or DB synchronization. Hence while
performing the testing, this has to be taken care as this can overcome the
potential invalid defects or bugs.
Example #1:
Project “A” has integrated architecture where the particular application makes use of
data from several other heterogeneous data sources. Hence the integrity of these data
with the destination location has to be done along with the validations for the following:
Primary foreign key validation
Column values integrity
Null values for any columns
What is ETL Testing?
ETL testing is a special type of testing that the client wants to have it done for their
forecasting and analysis of their business. This is mostly used for the reporting
purposes. For instance, if the clients need to have reported on the customers who use or
go for their product based on the day they purchase, they have to make use of the ETL
reports.
Example #2:
We will consider a group “A” doing retail customer business through a shopping market
where the customer can purchase any household items required for their day to day
survival. Here all the customers visiting are provided with a unique membership id with
which they can gain points every time they come to purchase things from the shopping
market. The regulations provided by the group say that the points gained expire every
year. And depending upon their usage, the membership can be either upgraded to a
higher grade member or downgraded to a lower grade member comparatively to the
current grade. After 5 years of shopping market establishment now management is
looking for scaling up their business along with revenue.
Hence they required few business reports so that they can promote their customers.
Following is a tabular form that describes the basic behaviour of both the testing formats.
Applicable In the functional system where the External to the business flow environment
place business flow occurs is the historical business data
Business Severe impacts can lead as it is the Potential impacts as in when the clients w
impact integrated architecture of the business have the forecasting and analysis to be do
flows
Data Nature Normalized data is being used here Denormalized data is being used here
Here the requirements are nothing but a mapping sheet that will have kind of mapping
between data within different databases. As we are aware that the ETL testing occurs on
multiple levels, there are various mappings needed for validating this.
Most of the time the data is captured from the source databases is not directly. All the
source data will have the tables’ view from where the data can be used.
The most significant thing in ETL is about identifying the essential data and the tables
from the source. The next essential step is the mapping of tables from source to the ETL
environment.
Following is an example how the mapping between the tables from the various
environments can be related to the ETL purpose.
The above mapping assumes the data from the source table to the staging table. And
from then on to the tables in EDW and then to OLAP which is the final reporting
environment. Hence at any point of time, data synchronization is very important for the
ETL sake.
Critical ETL Needs
As we understand ETL is the need for forecasting, reporting and analysing of the
business in order to capture the customer needs in a more successive manner. This will
enable the business to have higher demands than the past.
Here are few of the critical needs without which ETL testing cannot be achieved:
1. Data and tables identification – This is important as there can be many other
irrelevant and unnecessary data that can be of least importance when forecasting
and analyse the customer needs. Hence the relevant data and the tables have to
be selected before starting up the ETL works.
2. Mapping sheet – This is one of the critical needs while doing ETL works.
Mapping of the right table from the source to the destination is mandatory and
any problems or incorrect data in this sheet might impact the whole ETL
deliverable.
3. Table designs and data, column type – This is next major step when
considering the mapping of source tables into the destined tables. The column
type has to match with the tables at both the places etc.
4. Database access – Main thing is the access to the database where ETL goes
on. Any restrictions on the access will have an equivalent impact.
Example #3:
A company which manufactures silk fabric wanted to analyse on their annual sales. On
review of their annual sales, they found during the month of August and September
there was tremendous fall in sales with the use of the report they generated. Hence they
decided to roll out the promotional offer like the exchange, discounts etc., that enhanced
their sales.
Basic Issues In ETL Testing
There can be a number of issues while performing ETL testing like the following:
1. Either the access to the source tables or the views will not be valid.
2. The column name and the data type from the source to the next layer might not
match.
3. A number of records from the source table to the destined tabled might not
match.
And there might be much more..
Following is a sample of mapping sheet where there are columns like VIEW_NAME,
COLUMN_NAME, DATA_TYPE, TABLE_NAME, COLUMN_NAME, DATA_TYPE, and
TRANSFORMATION LOGIC present.
The first 3 columns represent the details of the source database and the next 3 are the
details for the immediate preceding database. The last column is very important.
Transformation logic is the way the data from the source is read and stored in the
destined database. This depends on the business and the ETL needs.
Following are few of the points to be taken care while ETL test planning and
execution:
#1: Data is being extracted from the heterogeneous data sources
#2: ETL process handling in the integrated environment that have different:
DBMS
OS
Hardware
Communication protocols
#3: Necessity in having a logical data mapping sheet before the physical data can be
transformed
#4: Understanding and examining of the data sources
#5: Initial load and the incremental load
#6: Audit columns
#7: Loading the facts and the dimensions
ETL Tools And Their Significant Usage
ETL tools are basically used to build and convert the transformation logic by taking
data from the source into another applying the transformation logic. You can also map
the schemas from the source to the destination which occurs in unique ways, transform
and clean up data before it can be moved to the destination, along with loading at the
destination in an efficient manner.
This can significantly reduce the manual efforts as the mapping can be done that is used
for almost all of the ETL validation and the verification.
ETL tools:
1. Informatica – PowerCenter – is one of the popular ETL tools that is introduced
by the Informatica Corporation. This has the very good customer base covering
wide areas. The major components of the tool are its tools for clients and the
repository tools and the servers. To know more about the tool please click here
2. IBM – Infosphere Information Server – IBM who is the market leader in terms
of Computer technology has developed the Infosphere Information server that is
used for the Information Integration and Management in the year 2008. To know
more about the tool please click here
3. Oracle – Data Integrator – Oracle Corporation has developed their ETL tool in
the name of Oracle – Data Integrator. Their increasing customer support has
made them update their ETL tools in various versions. To know more about the
tool please click here
More examples of the usage of ETL testing:
Considering some Airlines which want to roll out promotions and offers to attract the
customers strategically. Firstly they will try to understand the demands and needs with
the customer's specifications. In order to achieve this, they will require the historical data
preferably the previous 2 years data. Using the data they will analyze and prepare some
reports that will be helpful in understanding the customers’ needs.
The reports can be of following kind:
Analyzing these reports will help the clients in identifying the kind of promotions and
offers that will benefit the customers and at the same time can benefit business where
this can become a Win-Win situation. This can be easily achieved by the ETL testing
and reports.
In parallel, the IT segment faces serious DB issue that has been noticed that has
stopped multiple services, in turn, has the potential to cause impacts in the business. On
investigation, it was identified that some invalid data has corrupted few databases that
needed to be corrected manually.
In the former case, it is ETL reports and testing that will be required.
Whereas the latter case is where the DB testing has to be done properly to overcome
issues with invalid data.
Conclusion:
Hope the above tutorial has provided a simple and clear overview of what ETL testing is
and why it has to be done along with the business impacts or benefits they yield. This
does not stop here, but it can extend to set foresight in growth in business.
Business Intelligence testing initiatives help companies gain deeper and better insights
so they can manage or make decisions based on hard facts or data.
The way this is done has changed considerably in the current day’s market. What used
to be offline reports and such is now live business integration.
For example, Your Credit Card might not work at a new location because BI alerts the
application that it is an unusual transaction. This has happened to me once. I was at an
art exhibition where there were artisans from different parts of the US. I used my credit
card to buy a few things, but it would not go through because the seller was registered
from a part of US that my credit card was never used at. This is an example of BI
integration to prevent fraud.
Recommended product on Amazon or other retail sites, related videos on video sites
etc. are other examples of Business Integration of BI.
From the above flow, it is also apparent that ETL and storage systems are important to
successful BI implementation. Which is why, BI testing is never an independent event. It
involves ETL and Data warehouse testing as integral elements. And as testers, it is
important to understand and know more about how to test these.
STH has you covered there. We have articles that talk about these concepts. I will
provide the links below so we can get those out of the way and focus on BI alone.
Let us say a student’s details are sent from a source for subsequent processing and
storage. Make sure that the details are correct, right at this point itself. If the GPA shows
as 7, this is clearly over than the 5 point system. So, such data can be discarded or
corrected right here itself without taking it for further processing.
The source and destination data types should match. E.g.: You can’t store the
date as text.
Primary key, foreign key, null, default value constraints, etc. should be intact.
The ACID properties of source and destination should be validated, etc.
#3) Check the data Loading
(Into a data warehouse or Data mart or anywhere it is going to be permanently located):
The actual scripts that load the data and testing them would be definitely included in
your ETL testing. The data storage system, however, has to be validated for the
following:
This is what is considered Business Intelligence. But, as you can see from the above,
the reports are never going to be correct, consistent and fast if your preceding layers
were malfunctioning.
Simple, a BI testing project is a testing project too. That means the typical stages of
testing are applicable here too, whether it is the performance you are testing or
functional end to end testing:
Test planning
Test strategy
Test design (Your Test cases will be query intensive rather than plain text based.
This is the ONE major difference between your typical test projects to an
ETL/Data Warehouse/BI testing project.)
Test execution (Once again, you are going to need some querying interface such
as TOAD to run your queries)
Defect reporting, closure etc.
Conclusion:
BI is an integral element of all business areas. E-Commerce, Health Care, Education,
Entertainment and every other business relies on BI to know their business better and to
provide a killer experience to their users.
We hope this article gave you the necessary information to explore Business Intelligence
testing area much further.