Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
IBM WebSphere QualityStage includes a set of stages, a Match Designer, and related files that provide a
development environment within the WebSphere DataStage and QualityStage Designer for building jobs to
cleanse data. This environment lets you test your matching and blocking strategies before running match jobs,
and lets you manage and edit rules.
The WebSphere QualityStage functionality is available as either a stand-alone subset of WebSphere
DataStage or as an add-on to WebSphere DataStage. This functionality offers the full power and flexibility of
the WebSphere DataStage parallel execution framework and connectivity stages.
The WebSphere QualityStage components include the Match Designer, for designing and testing match
specifications and associated match passes, and the following WebSphere QualityStage stage types:
Investigate
Standardize
Match Frequency
Reference Match
Unduplicate Match
Survive
Phase One. Allows you to understand business goals by translating high-level directives into specific
data cleansing assignments and to make assumptions about the requirements and structure of the
target data.
Phase Two. Helps you identify errors and validates the contents of columns in a data file. Then you
use the results to refine how you are doing your business practices.
Phase Three. Allows you to condition the source data, match the data for duplicates or crossreferences to other files, and determine the surviving record.
Phase Four. Uses the results to evaluate how your organization maintains its data management and to
ensure that corporate data supports the company's goals.
An understanding of the mission that satisfies the business goals of your company can help you define the
requirements and structure of the target data. This knowledge also helps you determine the level of data quality
that your data needs to meet. This insight provides a context to help you make the appropriate decisions about
the data throughout the workflow.
The codes used in columns should be the same for both data source and reference source.
For example, if the Gender column in the data source uses M and F as gender codes, the
corresponding column in the reference source should also use M and F as gender codes (not, for
example, 1 or 0 as gender codes).
Whatever missing value condition you use (for example, spaces or 99999) must be converted in
advance to the null character. This can be done using the WebSphere DataStage Transformer stage.
If you are extracting data from a database, make sure that nulls are not converted to spaces.
Use the Standardize stage to standardize individual names or postal addresses. Complex conditions can be
handled by creating new columns before matching begins.
For example, a death indicator could be created by examining the disposition status of the patient. In a case
where one matches automobile crashes to hospital data, the E codes on the hospital record can be examined
to see if a motor vehicle accident is involved. A new variable (MVA) can be set to one. Set all other status
information to zero. On the crash file, generate a column that is always a one (since all crashes are motor
vehicle accidents). If both files report a motor vehicle accident, the columns match (one to one). Otherwise the
column do not match (one to zero).