Sei sulla pagina 1di 71

DataStage XML and Web services packs overview

Ascential DataStage, acquired by IBM in 2005, is now known as IBM WebSphere DataStage.
It is a convenient, GUI-based extract, transform, and load (ETL) tool that you can use to
integrate organizational data of different structures and formats and effectively support
customer relationship management (CRM) analysis, decision support systems (DSS), and ebusiness.
DataStage is based on a client-server infrastructure and includes designer, manager, director,
and administrator components, providing robust functionality. DataStage can dramatically
simplify tasks from development to maintenance.
The XML pack and Web Services pack of DataStage are the focal points of this article. With
these two packs, XML and Web services can be easily integrated into DataStage solutions.
The advantages of utilizing these two packs with DataStage are:
1. Reduced development time
2. Fast user response
3. User-friendly development, management, and maintenance GUI environment
DataStage XML (XML pack of DataStage) is composed of XML digesting, XML publishing,
and transforming functions. The XML section of this article introduces one of the major
conversions provided by the XML pack: transformation between XML documents and
tabular data.
The Web Services pack enables DataStage to call remote services and treat those services as
simple data sources, data targets, or interactive data interface. The Web Services Transformer,
which provides the interactive data interface, is discussed in the Web services part of this
article.
This article include four samples of developing data solutions with DataStage (Version.
7.5.1A, with XML and Web services packs installed), along with analysis and notes for those
samples. They are based on the following typical scenarios:

Publishing XML document from tabular data


Parsing XML document into tabular data
Accessing a Web service with input and output data

Prerequisites

WebSphere DataStage V7.5.1A with DB2 pack, XML pack, and Web services pack
should be installed.
IBM DB2 for Linux, UNIX, and Windows V8.2 (or higher) should be
installed.
The audience is assumed to have some practical experience of XML, Web services,
and DB2, as well as basic knowledge of DataStage.

Part I. Publishing XML documents from table data


Publishing XML documents based on the existing table data is a common scenario.
Sometimes, relational tables or sequential files are required to be transformed to XML
hierarchical structures, such as XML documents or XML chunks. In this case, the XML
output stage can be used to generate XML output. It uses XPath expressions to map input
table fields to certain positions in the output documents.
Sample 1. Generate XML files based on two tables using XML output stage
Figure 1. Job diagram of XML publishing

Overview of sample 1
In sample 1, customer data and contact data has been extracted from two corresponding DB2
tables respectively, as shown in Figure 1. The transformer is used to replace the complex
SQL, integrate the data, and feed the joined data to the XML output stage through DSLink6.
XML output stage then generates the XML results and saves them to the file system. Figure 1
briefly describes the whole application demo.
The general steps for deployment are:
1.
2.
3.
4.
5.
6.

Define and deploy DB2 tables


Prepare XML structure
Import DB2 table and XML table definitions
Set up DB2 stages with a transformer to provide joined data
Set up XML output stage to generate XML document
Compile and run

Let's examine these steps in detail:


Step 1. Define and deploy DB2 tables
Deploy these tables into the sample database using the DDL shown in Listing 1 on the local
DB2 server, and load some sample data, as shown in figures 2.1 and 2.2:
Listing 1. DDL for customer and contact table

--customer table
CREATE TABLE S_CUST
(
CUST_NUM CHARACTER(10) PRIMARY KEY,
CUST_NAME VARCHAR(100) NOT NULL
);
--contact table
CREATE TABLE S_CONTACT
(
CNT_NUM INTEGER
GENERATED ALWAYS AS IDENTITY( START WITH 1, INCREMENT BY 1 )
PRIMARY KEY,
CUST_NUM CHARACTER(10) NOT NULL,
F_NAME VARCHAR(50) NOT NULL,
L_NAME VARCHAR(50) NOT NULL,
EMAIL VARCHAR(100) NOT NULL
);

Figure 2.1. Sample data of customer table

Figure 2.2. Sample data of contact table

Step 2. Prepare XML structure


The target XML structure can be defined either through an XSD schema file or XML sample
document. Listing 2 is an XML sample document to be imported into DataStage:
Listing 2. target.xml - sample XML file
<CustomerInfo xmlns="http://www.ibm.com/schemas/SimpleXMLDemo"
xmlns:cnt="http://www.ibm.com/schemas/ContactDemo"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CustomerID>0000000001</CustomerID>
<CompanyName>MindLight Corp.</CompanyName>
<Contacts>
<cnt:Contact>
<cnt:FirstName>Lee</cnt:FirstName>
<cnt:LastName>Chen</cnt:LastName>
<cnt:Email>LeeChen@gmail.com</cnt:Email>
</cnt:Contact>L
</Contacts>

</CustomerInfo>

As shown Listing 2, there are five main fields in the XML file: CustomerID, CompanyName,
FirstName, LastName, and Email. Each of them has a counterpart column in the database
tables.
Step 3. Import DB2 table and XML table definitions
To enter the data source information, you need to either input the table definitions manually
or import the definitions directly from source data in DataStage. DataStage table definitions
include metadata of many kinds of data sources in tabular forms (refer to Figure 3), including
the name and location of the tables or files holding the data and a definition of the columns
they contain. Information is stored in table definitions in the DataStage repository.
In this sample, two database tables are imported from DB2, and the XML structure is
extracted from the sample file.
Note: DB2 and XML plug-in modules should be installed to enable these features.
Figure 3. Table definition structure

Import DB2 table definitions


Select the Plug-in Meta Data Definitions (in red circle, as shown in Figure 4) to import a
DB2 table.
Figure 4. Plug-in metadata definitions

Select DB2 plug-in from the pop up window, and click OK. On the first page of the "Import
DSDB2 Meta Data" window (refer to Figure 5), select the SAMPLE database with proper
username and password as the source to import the metadata. Make sure you select the
Tables and Fully Qualified Table Names check boxes so the table names display with
schema name in the next window. Click the Next button.
Figure 5. Import DSDB2 metadata 1

DB2 table definitions can be stored in category "PlugIn\sample," as shown in Figure 6:


Figure 6. Import DSDB2 metadata 2

Import XML table definitions


Right-click on Table definitions in the repository view, and choose Import > XML Table
Definition (similar to the action shown in Figure 4). After doing so, the XML Meta Data
Importer will be activated (refer to Figure 7).

Figure 7. XML Meta Data importer

The XML Meta Data Importer can import XML structure from either XML sample or XSD
files. Open the target.xml file in the XML Meta Data Importer. Once opened, the XML file is
mapped into tabular structure. As depicted in Figure 7, select the TEXT fields of the five
data elements (marked in red circle) as the columns of table definition to extract text content
of the XML elements. The "Description" field (marked in red box of Figure 7) contains the
XPath, which indicates the corresponding position of a column in the XML document. In this
sample, it is assumed that every contact record has a unique e-mail address that is used as the
"Key" column. Save the table definition in the "PlugIn\sample" category, named as
"target_xml".
As of now, three table definitions (in red circle of Figure 8) have been imported into the
category successfully.
Figure 8. Imported table definitions

Step 4. Set up DB2 stages with a transformer to provide joined data


With the table definitions ready in repository, it is time to set up the data source for XML
output. There are two options to extract the data:
1. Use the SQL, as shown in Listing 3, in one DB2 stage
2. Use the transformer to join the data from multiple stages
This sample follows the second option:
Listing 3. Query for the data extraction
SELECT cnt.CUST_NUM,c.CUST_NAME,cnt.F_NAME,cnt.L_NAME,cnt.EMAIL
FROM S_CONTACT cnt
JOIN S_CUST c ON cnt.CUST_NUM=c.CUST_NUM ;

Figure 9. Results of query from Listing 3

To achieve the same output shown in Figure 9 in DataStage, two DB2/UDB API stages (in
Database category of the Palette) and a Transformer stage (in Processing category of the
Palette) are adopted. These stages are added to a new job diagram and named as follows:

The two DB2 stages: "CONTACT" and "CUST"


The Transformer: "JOIN_CUST_CNT"

Connect "CONTACT" and "CUST" to the "JOIN_CUST_CNT" using DSLink3 and


DSLink4 respectively. (See Figure 10.)
Figure 10. Job diagram

As a database developer, you usually join the customer table to the contact table and treat the
customer as the main table. But in this sample, the contact table is the main one, indicated by
the solid arrow DSLink3. The transformer carries out a lookup action in non-primary tables
for records of the primary table. For a single record of the primary table, only one record will
be retrieved from the non-primary tables.
Note: Only ODBC and UniVerse stages support multi-row lookup.
"CONTACT" stage configuration
Double-click on the "CONTACT" stage to configure its properties. On the "Stage" page, fill
in the DB2 connection information: database name, username, and password. Switch to the
"Output" page, and click on the columns tab. The column information includes column
name, derivation, type, and so on. Columns can be edited directly or loaded from an existing
table definition. Here, use the imported CHENLI.S_CONTACT table definition to load the
columns for this stage. Click the Load button, select CHENLI.S_CONTACT in the pop-up
table definition window, press OK, then select all the columns of the table definition in the
successive column selection window. When the columns are loaded, the result looks like
Figure 11:

Figure 11. Loaded columns

Set up the SQL in the "CONTACT" stage to extract the data from DB2. Select the SQL tab,
and click SQL Builder. The SQL Builder, a GUI tool, is activated by default to assist you in
completing the SQL query (refer to Figure 12). Figure 12. SQL builder

Click OK when the SQL is ready. The "SQL" tab should look like Figure 13. You can now
select the "View Data" button to check the "CONTACT" table contents.
Figure 13. SQL tab

As of now, the configuration of the "CONTACT" stage is complete. You can follow the same
steps to set up the "CUST" stage. Afterwards, two tables' icons will be displayed on both
links.
"JOIN_CUST_CNT" stage configuration
Now you can configure the transformer "JOIN_CUST_CNT" to aggregate the two input DB2
stages together and produce the joined data. Drag CUST_NUM from DSLink 3 to the
CUST_NUM of DSLink4, as shown in Figure 14:
Figure 14. Transformer stage

Step 5. Setting up the XML output stage to generate XML document


Add the XML output stage to the diagram from the "Real Time" category of "Palette," named
"Gen_Cust_XML," and add the link from the "JOIN_CUST_CNT" transformer stage to it, as
shown in Figure 15:
Figure 15. Add XML Output to the job diagram

Double-click on the Gen_Cust_XML stage to set its properties. On the "Stage" page, you
only need to set the "Namespace Declarations" in the "Document Settings" tab (see Figure
16) and the file path of the output XML file in the "Options" tab (see Figure. 17) for current
solution. Loading namespaces declaration is similar to loading columns from a table
definition -- it loads namespaces from the XML table definition.
Figure 16. Load namespaces declaration

Figure 17. Set output file path

On the "Input" page, you need fill out the columns tab. The process is similar to loading
output columns for DB2 stages. After loading the XML table definition "target_xml," the
XML output stage is shown as illustrated Figure 18.
Figure 18. Load columns from XML table definition

Note: Each customer can own multiple contacts in the target XML structure, so you need to
aggregate the contact information for each customer to reduce redundancy. For this purpose,
set the "transformation settings" as shown in Figure 19:
Figure 19. Transformation settings

Using the trigger column "CustomerID," the output will generate a new record containing a
single XML document only when the "CustomerID" field is changed to a new value. The data
from Figure 9 will be aggregated into a single XML document.
Go back to the "JOIN_CUST_CNT" stage and link the columns on the left side of input to
the output on the right side by dragging those fields. (Refer to Figure 20.)

Figure 20. Transformer stage revisited

Click OK. The XML publishing solution is completed so far.


Step 6. Compile and run
Compile and run this job in the designer. If there is no error, you can add it to the job
schedule as a routine task. When running the job in the designer, the links with relevant
statistics info will turn green if no errors are found (see Figure 21). If an error occurs, the
links will be red. You have to check the logs for debugging. In Figure 21, this job has
successfully generated a target XML file: SampleOutput1.xml.

Figure 21. Running the job of sample 1

Back to top
Part II.
Parsing XML
document into
tabular data
This scenario
is a reverse
process of
publishing
XML data -XML data
comes in, you
parse it into
tabular
records, then
insert the
records into
database
tables. Two
samples will
be
demonstrated
to show how
to parse XML
document into

tabular data.
sample 2 is
the reverse
process of the
former XML
publishing
sample,
loading XML
documents to
the customer
table and the
contact table.
Sample 3
contains a
more complex
XML input as
well as db2
outputs (three
tables).
Different
deployment
approaches
will be used in
these two
cases.
Sample 2.
Extract
customer and
contact
information
from XML
document
using XML
input stage
Overview of
sample 2
In sample 2,
use the table
definitions
from sample 1
(refer to
Figure 8) and
prepare input
date using an
XML file with
a new
company and
two
contacts(see

input_cust_cnt
.xml). In this
case, you use
a folder stage
to read the
XML file,
pass the
content to the
XML input
stage to parse
out tabular
records, and
then use the
transformer to
make the
columns
diverge into
different
tables. The
job diagram is
shown in
Figure 22:
Figure 22.
Sample 2 job
diagram

The expected
outputs of
"XML_CUST
_CNT" stage
will be similar
to the rows
shown in
Figure 9.
Note: There
will be
duplicate info
on columns

CUST_NUM
and
CUST_NAM
E, which will
be passed to
the customer
table and will
cause primary
key collision.
In order to
handle this
issue, you can
use an
aggregator to
consolidate
such duplicate
records into
one record for
each
customer.
General steps
to develop this
sample
solution
include:
1. Set up
folder
stage
to read
XML
file(s)
2. Set up
XML
input
stage
3. Set up
DB2
stages
to
receive
data
4. Config
ure the
aggreg
ator
5. Set up
the
transfo
rmer
stage

for
distrib
uting
data
6. Compi
le and
run
Let's examine
these steps in
detail.
For this
sample, you
can add all the
stages and
links shown in
Figure 22 to
the new job
first, then
work on the
settings of
each stage
step by step
after.
Step 1. Set up
folder stage
to read XML
file(s)
Double-click
the
XML_FOLD
ER stag. Set
the folder
pathname
property in the
"Stage" page.
Fill the wild
card property
(filename
wildcard*.txt,
for example;
we use *.xml
for this case)
in the
"Output"
page.
Then you can
load columns
for the

"XML_FOLD
ER" stage,
adopting the
build-in table
definition
"Folder," as
shown in
Figure 23.
The "Folder"
definition
contains two
fields:
"FileName"
and "Record."
The "Record"
column will
contain the
contents of the
input files
from the
source folder.
Figure 23.
Load
columns
from
"Folder"
table
definition

Step 2. Set up
XML input
stage
Double-click

XML_CUST
_CNT to open
its properties
window.
Specify
"Record" as
the XML
source
column, from
which to get
input XML
documents.
Then set
"Column
content" to
"XML
document"
accordingly.
Figure 24.
Set up XML
source

On the
"Output"
page, as
shown in
Figure 25, fill
in the
"Transformati
on Settings"
tab and the
"Columns"
tab to enable
the
"XML_CUST
_CNT" stage
to parse the
input XML

documents
and generate
output
records. First,
select
Repetition
element
required, as
the input
documents
contain
multiple
contact
elements.
Second, load
namespaces
from the XML
table
definition
(similar to that
shown in
Figure 16).
Note:
Namespaces
declaration is
a must when
the incoming
document
contains
multiple
namespaces
and prefixes.
Figure 25.
Transformati
on settings of
"XML_CUS
T_CNT"
stage

Generally
there is no
need to load
column
definitions for
the XML
input stage
manually
because
loading
namespaces
will trigger
loading
columns of
the XML table
definition into
the
"Columns"
tab at the
same time
(refer to
Figure 26).
Note: In the
XML input
Stage,
namespaces
and columns
are loaded at
the same time,
unlike the
XML output
Stage where
they have to
be loaded
separately.

Figure 26.
Output
columns of
"XML_CUS
T_CNT"
stage

Each column
definition has
an XPath
expression in
its description
field, mapping
a certain
element in the
XML
document to a
corresponding
output data
column. These
XPaths are the
most critical
settings for
parsing XML.
In addition,
the key field
specifies
repetition
elements.
Here, the email element
is chosen to
be the key
since it is
unique for a
contact.

Now setup for


"XML_CUST
_CNT" stage
is complete.
Step 3. Set up
DB2 stages to
receive data
Server
information
for the two
DB2 stages
"CONTACT"
and "CUST"
is configured
first.
But unlike
previous DB2
stages sample,
you now work
on the "Input"
page of DB2
stage, as input
links are used
instead of
output links.
Select the
target table
(see Figure
27), and load
columns for
each DB2
stages.
Note: Autogenerated
columns
should not be
included.
Figure 27.
Select target
table

For contact
table, column
"CNT_NUM"
is not selected
because it is
an auto
incremental id
column and its
value will be
generated
automatically
when the new
record is
inserted. On
the other
hand, all the
columns of
the customer
table are
selected, as
they are all
from the XML
document.
Step 4.
Configure
the
aggregator
Because the
XML input
stage will
generate
multiple rows
for the same
customer, an

aggregator
stage is
needed to
consolidate
redundant
information of
customer for
the "CUST"
stage.
The input data
received by
the aggregator
should have
the same
structure as
the output
link, which is
defined by the
"CUST"
stage.
(Essentially, it
is defined by
the DB2 table
"CHENLI.S_
CUST".) So
load the table
definition
"CHENLI.S_
CUST" as
input of this
aggregator.
On the outputs
page of the
aggregator,
the output
columns are
"CUST_NUM
" and
"CUST_NAM
E". You need
to define the
derivation of
the two output
columns to
eliminate
duplicate
records, as
shown in
Figure 28.

Figure 28.
Aggregator
stage

Double-click
on the cell
that's
highlighted in
Figure 28
under the
derivation
column. A
"Derivation"
dialog box
will appear for
configuration.
Figure 29
depicts a
detailed
configuration
of derivation
with the
aggregate
function
"grouped."

Figure 29.
Derivation
dialog box

Besides
"grouped,"
you can use
functions like
Max, Min,
Count,
Average,
Sum, and
more for other
cases in the
Aggregate
function
selection box.
Step 5. Set up
the
transformer
stage for
distributing
data
Link the
source to the
targets in the
transformer
properties
window (see
Figure 30).
Figure 30.
Distributing
data in the
transformer

Press OK
when finished.
Sample 2 is
ready to be
run.
Step 6.
Compile and
run
If everything
is ok, a view
like that
shown in
Figure 31 can
be seen in the
designer when
the loading is
done.
Figure 31.
Successfully
load a XML
document

Check the
tables in DB2
using the SQL
in Listing 4,
and the result
is shown in
Figure 32:
Listing 4.
Check the
tables in DB2
SELECT * FROM S_CUST cust WHERE cust.CUST_NUM='0000000002';
SELECT * FROM S_CONTACT cnt WHERE cnt.CUST_NUM='0000000002';

Figure 32.
Results of
SQLs in
Listing 4

Sample 3.
Extract data
from the XML
document into
three tables
using a bunch
of XML input
stages
Overview of

sample 3
In sample 2,
the XML
document
(input_cust_c
nt.xml)
represents a
typical
scenario that
contains data
distributed
into only two
tables, one
referencing
the other. This
is a more
complicated
case: The
XML
document has
multiple
bunches of
repetition
elements (see
Listing 5) or
nested
repetition
elements (see
Listing 6),
which
distribute data
into more than
two tables.
The
corresponding
relational
models can be
described as
follows:
1. Three
or
more
tables
are
referen
ce to a
header
table
2. Three
or

more
tables
nested
into a
tree
structu
re
3. A
mixtur
e of 1
and 2
Although this
sort of
documents
can hardly be
handled by a
single XML
input, they are
the most
common cases
since various
related data
can be
encapsulated
into a single
document at
the same time.
Listing 5.
XML
document
has multiple
bunches of
repetition
elements
<A>
<A_DATA/>
<B>

</B>
<C>
</C>
</A>

Listing 6.
XML
document
has multiple
bunches of
nested
repetition
elements
<A>
<B>

</B>
<B>

</B>
</A>

In order to
handle XML
documents as
those in
Listings 5 and
6, a simple
solution is to
use a bunch of
XML input
stages linked
by
transformer
stages to
receive and
handle
different
branches of
the document
separately. In
sample 3, the
element
"Orders" is
added to the
XML
structure from
sample 2 (see
input_cust_cnt
.xml) to build
a new
document,
input_cust_cnt
_ord.xml (also
shown in
Listing 7
below). Per
the structure
in Listing 5, a
new table
"S_ORDER"
(see Listing 8)
is created
accordingly.
Listing 7.
Input_cust_c
nt_ord.xml
<?xml version="1.0" encoding="UTF-8"?>
<CustomerInfo xmlns:cnt="http://www.ibm.com/schemas/ContactDemo"
xmlns="http://www.ibm.com/schemas/SimpleXMLDemo"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CustomerID>0000000003</CustomerID>
<CompanyName>Eastway software Inc.</CompanyName>
<Orders>

</Orders>
<Contacts>

</Contacts>
</CustomerInfo>

Listing 8.
DDL of order
table
CREATE TABLE S_ORDER
(
ORDER_NUM CHARACTER(5) PRIMARY KEY,
CUST_NUM CHARACTER(10) NOT NULL,
PART_ID CHARACTER(3) NOT NULL,
QTY
DECIMAL(10,3)
);

In this
solution, two
layers of

XML input
stages are
used to
process the
XML
document in
two phases. In
the first phase,
an XML input
stage is used
to parse the
customer
information
and get the
XML chunks
of contacts
and orders.
You then pass
the two
chunks to two
separate XML
input stages of
the next phase
to extract data
for the table
"S_CONTAC
T" and
"S_ORDER".
The job
diagram is
shown in
Figure 33:
Figure 33.
Sample 3 job
diagram

General steps

to develop this
sample
solution
include:
1. Set up
the
folder
stage
2. Config
ure the
three
XML
input
stages
3. Set up
the
transfo
rmer
to link
related
stages
4. Set up
DB2
output
stage
5. Compi
le and
run
Let's examine
these steps in
detail.
The methods
of setting up
each stage are
already
covered in the
previous
section.
Therefore, this
section omits
these details
and focuses
on the tricky
parts: the
"Columns"
settings of
these XML
input stages.
To start,

import the
XML table
definitions
through the
XML Meta
Data Importer
(refer to
Figure 7).
Secondly,
modify the
column lists
and XPaths of
the definitions
when
applying
them.
The first XML
input stage is
named
"XML_Cust."
Set up its
output link
columns to
extract the
fields
CustomerID
and
CompanyNam
e for table
"S_CUST"
and XML
chunks for
<Orders/> and
<Contacts/>,
as shown in
Figure 34:
Figure 34.
Output
Columns tab
of
"XML_Cust
"

Note: In this
particular
case, in order
to extract
XML
chunks,/text()
should not be
appended to
the end of the
XPath
expression,
"/ns1:Custom
erInfo/ns1:Or
ders," for
example.
Accordingly,
the column
will return the
whole XML
structure of
that element
and can easily
be further
processed by
other stages.
In order to
receive XML
chunks and
produce
records with
references to
the customer
table, you
have to set the
other two
XML input

stages
manually to
get
CUST_NUM
field and
XML chunk
field in the
input
columns, like
Figure 35.
Figure 35.
Input
Columns tab
of
"XML_Cont
act"

When the
input columns
of
"XML_Conta
ct" stage and
"XML_Order"
stage are
ready, use the
transfomer
"Div_branche
s" to connect
them.
Figure 36.

Distributing
data in
"Div_branch
es" stage

Then the
output
columns of
"XML_Conta
ct" and
"XML_Order"
stages are
configured
respectively.
Note: There is
no
transformer
between the
XML input
stage and the
DB2 stage to
map the
output fields
to DB2
columns. So
you have to
make sure the
names of the
output
columns in the
XML input
stage
("XML_Conta
ct" and
"XML_Order"
) are exactly
the same with

the names in
the database
tables. Once
they are all
set, the input
columns in the
DB2 stages
will be loaded
automatically.
DO NOT
override the
input columns
in DB2 stages
by reloading
table
definitions,
otherwise the
XPath in the
foregoing
XML input
stage will be
lost.
Figure 37.
Output
columns of
"XML_Cont
act" stage

Figure 38.

Output
columns of
"XML_Orde
r" stage

Note:
"CUST_NUM
" field exists
in both input
and output
columns of
the two XML
Input stages.
This field is a
"passthrough" field
that is directly
passed from
the input link
to the output
link.
Run the job
after setting
up all the
stages, and
you should be
able to see a
successful
view like
Figure 39.

Figure 39.
Successfully
load an XML
file into three
tables

Check the
data in DB2.
Listing 9.
SQL for
checking
table data
SELECT * FROM S_CUST cust WHERE cust.CUST_NUM='0000000003';
SELECT * FROM S_CONTACT cnt WHERE cnt.CUST_NUM='0000000003';
SELECT * FROM S_ORDER ord WHERE ord.CUST_NUM='0000000003';

Figure 40.
Results of
SQLs in
Listing 9

Part III.
Accessing
Web services
with input and
output data
With Web
services pack,
DataStage can
easily access
Web service
operations
within a
DataStage
server job.
The Web
service pack
will add two
real-time
stages to
DataStage:
Web services
client stage
and Web
services
transformer
stage. When
the Web
service is
supposed to
act as either a
data source or
a data target,
the Web
services client
stage is used.
In this case,
only one link,
input or
output is
required in a
single Web
service
operation.
However, the
Web services
transformer
stage has both
input and
output links
and can
interact with

remote
services in a
sense of data
transaction.
The Web
services pack
will also
enable
DataStage to
support Web
service
routines,
which are
used as
transformer
functions in
transformer
stages. This
part of the
article only
uses and
discusses Web
services
transformer in
the sample, as
it is more
typical and
flexible for
adopting Web
services in
DataStage.
The Web
services client
stage and Web
service
routines are
simpler and
well
documented
in the "Web
services Pack
Designer
Guide" (see
Resources).
To integrate a
Web services
transformer
into a
solution, you
need to import
the WSDL

Web services
description
file, add a
Web services
transformer to
your solution,
and choose
the target
service in this
Web services
transformer.
Then feed
data to it and
handle the
output in
proper ways,
like extracting
and saving
data to files or
databases. The
following
sample is
intended to
demonstrate
this feature in
a practical
case.
Sample 4.
Query Google
with
keywords
from a
database table
and store the
responses in
another table
Overview of
sample 4
GoogleSearch
service is
selected as the
target service
since it is a
typical and
common
scenario: a
query is
submitted,
then records

are returned.
Besides, the
Web services
is already
available and
free to be
used.
GoogleSearch
service is in
the Google
SOAP Search
API project
(see
Resources),
including
three
operations:
doGoogleSear
ch,
doGetCached
Page and
doSpellingSug
gestion. In
this example,
the
doGoogleSear
ch operation is
used. Before
accessing the
service, you
have to sign
up for a free
account (a
Web page
address) in
order to get a
license key
that is sent in
the requested
messages to
identify the
requester. In
the transaction
of
doGoogleSear
ch operation,
a query is sent
to Google
with the
license key,
and a response
message

returns
containing
several hits in
a repetition
structure.
Note: There is
a limitation
that Web
services
transformer in
DataStage
7.5.1A cannot
parse
repetition
elements in
the response
into multiple
rows of
records. Each
input row can
only generate
one row of
output data.
Fortunately,
Web services
transformer
uses XPath to
extract data
from the
response
document and
supports XML
chunks output.
Thus, you can
extract the
XML chunks
containing the
repetition
structure from
an original
message and
then pass
them to
following
XML input
stage(s). In
this way, the
limitation can
be bypassed
and even more
complex

structure can
be processed.
The target
solution of
sample 4 will
extract query
keywords
from a DB2
table and feed
them to the
Web services
transformer.
Web services
transformer
will invoke
the Web
services and
output the
response
messages
directly to a
following
XML input
stage. Finally,
you put the
results into
another DB2
table. The job
diagram is
shown in
Figure 41:
Figure 41.
The expected
job diagram
of sample 4

In this sample,
several steps
will be taken
to set up the
job, which
include:

1. Import
the
target
service
s from
WSDL
file
2. Set up
Web
service
s
transfo
rmer
3. Set up
the
XML
Input
stage
4. Create
DB2
tables
and set
up
databa
se
stages
for
input
and
output
5. Compi
le and
run
Let's examine
these steps in
detail.
Step 1.
Import the
target service
from WSDL
file
Right-click on
Table
Definition in
the repository
view, and
select Import
> Web
Services
WSDL

Definitions
(refer to
Figure 4).The
Web Services
Meta Data
Importer is
activated (see
Figure 42).
Use it to load
the WSDL
file. Select the
target
operations,
then click the
Import button
to save the
definition.
Figure 42.
Web Services
Meta Data
Importer

Several table
definitions
will be

generated for
an operation
imported.
doGoogleSear
ch is not an
except; the
following
table
definitions are
supposed to
be created
automatically:

Info_
WS:
Web
Servic
es
Meta
Data
like
URI,
port
name
and
etc.
doGoo
gleSea
rch_IN
: input
param
eters
doGoo
gleSea
rch_M
SGIN:
input
messa
ge
meta
data
doGoo
gleSea
rch_M
SGOU
T:
output
messa
ge
meta
data
doGoo

gleSea
rch_O
P:
operati
on
meta
data
doGoo
gleSea
rch_O
UT:
output
param
eters

Step 2. Set up
Web services
transformer
Web services
transformer is
the core
component of
the solution
since data
structure of
input and
output
depends on it
fully.
Therefore,
Web services
transformer is
to be settled
with high
priority. In the
stage page of
Web services
transformer,
only the Web
services
operation
needs to be
configured to
access an
open public
service like
GoogleSearch
with basic
function,
which is
discussed in
this sample.

Note:
Moreover, the
stage page
provides
many more
options such
as setting
proxy, HTTP
AUTH,
HTTPs, Web
services
parameters,
and so on for
more
advanced
application.
To configure
the target
operation for
Web services
transformer,
click Select
Web Service
Operation in
the "Stage"
page, and
double-click
the target
operation in
the Web
services
browser, as
shown in
Figure 43:
Figure 43.
Select Web
Service
Operation

Switch to the
"Input
Message" tab
on the "Input"
page(see
Figure 44).
Figure 44.
Load input
message
metadata

Click Load
Message
Information
to load
metadata,
namespaces,
and columns
automatically
from the
selected Web

services
operation.
Figure 45.
Input
columns are
also loaded
when
metadata is
loaded

The columns
shown in
Figure 45 are
the actual
parameters
that need to be
fed to Google
in the
doGoogleSear
ch operation,
such as
license key
(key), query
keyword (q),
and many
other
configuration
parameters.
The settings
for output link
is similar to

the input link


-- you load
metadata in its
"Output
Message" tab
of the Output
page. As
mentioned in
step 1 of
sample 4, an
XML input
stage is used
to handle the
repetition
element in
responses.
The output
link of this
Web services
transformer
has to be
customized to
output the
whole SOAP
document of
the response.
In Figure 46,
the column
"RS_XML"
contains the
response in
the form of a
valid XML
document:
Figure 46.
customized
output
columns of
"GoogleWST
" stage

Step 3.
Setting up
the XML
input stage
Configuring
the XML
input stage is
already
covered by
previous
samples (refer
to step 2 of
sample 2).
The only
difference
here is the
table
definition for
output link,
which is not
from an
imported
XML table
definition but
from the Web
services
output table
definition,
named
"doGoogleSea
rch_OUT".
There are lots
of columns in
this table
definition,
representing
every data

element
defined in the
response
message. This
sample only
uses a portion
of them: URL,
snippet, title,
cached size,
related
information
present, host
name, and the
query
keyword itself
for test
purpose.
Figure 47.
Output
columns of
the
"RS_parsing
" XML input
stage

Note: The
URL column
is set to be the

"Key" column
because it can
distinguish
each search
hit returned.
Step 4.
Create DB2
tables and set
up database
stages for
input and
output
Create a
simple query
table
"S_GOOGLE
_Q" to contain
only id and
keyword field
in it and a
result table
named
"S_GOOGLE
_RS" covering
all the
response
columns
picked out
previously:
Listing 10.
DDLs for
query table
and response
table
CREATE TABLE S_GOOGLE_Q
(
Q_ID
INTEGER
GENERATED ALWAYS AS IDENTITY( START WITH 1, INCREMENT BY 1)
PRIMARY KEY,
KEYWORD VARCHAR(255)
);
CREATE TABLE S_GOOGLE_RS
(
R_ID
INTEGER
GENERATED ALWAYS AS IDENTITY( START WITH 1, INCREMENT BY 1)
PRIMARY KEY,

URL
VARCHAR(255) NOT NULL,
CACHEDSIZE
VARCHAR(255) NOT NULL,
HOSTNAME
VARCHAR(255) NOT NULL,
RELATEDINFORMATIONPRESENT VARCHAR(255) NOT NULL,
SNIPPET
VARCHAR(255) NOT NULL,
TITLE
VARCHAR(255) NOT NULL,
SEARCHQUERY
VARCHAR(255) NOT NULL
);

Set up two
DB2 stages,
"REQUESTS"
and
"RESULTS,"
for the two
DB2 tables. In
the
"REQUESTS"
DB2 stage,
columns
should include
all the fields
required by
the
doGoogleSear
ch operation.
The table
definition is
also loaded
from the Web
services table
definition
"doGoogleSea
rch_IN".
Note: If SQL
type column is
"unknown," as
shown in
Figure 45, you
have to assign
a regular type
like
"VarChar" to
them.
Figure 48.
Output
columns of

"REQUEST"
DB2 stage

Select Enter
custom SQL
statement in
the SQL tab,
and fill in a
query to
extract
keyword from
"S_GOOGLE
_Q" table,
including
constant
columns for
the other input
parameters of
the Web
services
operation (see
Figure 49).

Figure 49.
Use custom
SQL
statements to
prepare data

A transformer
has to be used
to connect the
"REQUEST"
stage to
"GoogleWST"
stage.
Otherwise the
job won't start
even if the
column names
exactly match
in the two
stages. This is
possibly
caused by the
"UnKnown"
SQL type in
the Web
services table
definition.
Figure 50.
"COL_mapp
ing"
transformer
setting

Now
configure the
"RESULTS"
DB2 stage.
You only need
to configure
the server info
and select the
target table.
The column
names in the
table
definition of
the XML
input stage,
"RS_parsing"
(refer to
Figure 41), is
exactly the
same with the
names in the
database table.
DB2 stage can
insert these
columns into
the DB2 table
automatically
since it's using
input column
names to
create insert
statements.

Step 5.
Compile and
run
The job is
available to be
compiled and
run, as the
above settings
are complete.
But before
accessing the
online service,
some
keywords
should be
prepared in
the query
table. For
instances,
three
keywords,
"Beijing,"
"space
shuttle," and
"iguana," are
used. (See
Figure 51.)

Figure 51.
Done google
search
successfully

From Figure
51, you can
see that 30
records have
been inserted
into the result

table -- 10 for
each keyword.
Check the
"S_GOOGLE
_RS" table to
examine the
data.
Listing 11.
SQL to check
results
returned
from google
SELECT searchquery, url, hostname FROM S_GOOGLE_RS;

Figure 52.
Results of
SQL in
Listing 11

Additional:
Use userdefined
message to
invoke Web
services
Sometimes,
the input
message is
very complex,
or the
message body
is already

available. In
such cases,
using table
data as input
may not be
the best
solution.
The Web
services
transformer
supports userdefined
messages and
accepts XML
documents as
message body.
This feature
provides
additional
flexibility.
You can
choose the
appropriate
method to
prepare the
request
message, such
as manually
typing out a
message
document, or
using a thirdparty program
to generate
complex
XML
documents
from the
database.
To show the
usage of userdefined
message, you
can slightly
change the job
in sample 4,
using a folder
stage to feed
an XML file
(googleReq.x

ml) to the
Web services
transformer.
The new job
is shown in
Figure 53:
Figure 53.
Web services
job using
user-defined
message

Figure 54.
User-defined
message
settings of
Web services
transformer
stage

Conclusion
The XML and
Web services
packs are
powerful and
convenient
toolkits. This
article
discussed how
to publish an
XML
document
from DB2
tables, how to
extract data
from the XML
document, and
how to access
the Web
services using
four samples,
including key
procedures as
well as
analysis.
DataStage
provides
many other
useful
components
and intuitive
ways of
combining

them to build
very flexible
integration
solutions.
In your own
practice, you
are sure to
find more
interesting
applications
of DataStage's
XML and
Web services
functionalities
and of other
DataStage
components
as well.

Potrebbero piacerti anche