Sei sulla pagina 1di 4

Notable new Features in DataStage 8.

Its Fast!

DataStage 8.5 is considerably faster than its previous version (8.1). Tasks like saving, renaming, compiling are faster by nearly 40%. The run time performance of jobs has also improved.

The parallel engine

on DataStage has been tuned to improve performance and resource usage has reduced by 5% when compared to DataStage 8.1

XML data

DataStage has historically been inefficient at handling XML files, but in 8.5 IBM has given us a great XML processing package. DataStage 8.5 can now process large XML files (over 30 GB) with ease. Also, we can now process XML data in parallel. The new XML transform stage can data from multiple sources into a single XML output stream. If you think that is cool, it can also do it the other way around i.e., multiple XML input to a single output stream. It can also convert data from one XML format to another.

Transformer Stage

It is one of the most used and the most important stages on DataStage and it just got better in 8.5 a. Transformer Looping: Over the years DataStage programmers have been using workarounds to implement this concept. Now IBM has included it directly in the transformer stage. There are two types of loopings available Output looping: Where we can output multiple output links for a single input link This is achieved using a new system variable @ITERATION Input looping: We can now aggregate input records within the transformer and assign the aggregated data to the original input link while sending it to the output. b. Transformer change detection: SaveInputRecord() Save a record to be used for later transformations within the job GetInputRecord() Retrieve the saved record as when it is required for comparisons c. System Variables: i. @ITERATION: Used in the looping mechanism ii. LastRow(): Indicates the last row in the job iii. LastRowInGroup(): Will return the last row in the group based on the key column d. New NULL Handling features: In DataStage 8.5 we need not explicitly handle NULL values. Record dropping is arrested if the target column is nullable. We need not handle NULL values explicitly when using functions over columns that have NULL values. And also stage variables are now nullable by default.

APT_TRANSFORM_COMPILE_OLD_NULL_HANDLING has been prepared to support backward compatibility e. New Data functions: There are a host of new date functions incorporated into DataStage 8.5. I personally found the below function most useful DataFromComponents(years, months, daysofmonth) Ex: DataFromComponenets(2012,07,20) will output 2012-07-20 DataOffsetByComponents(basedate, years offset, month offset, daysofmonth offset) Ex: DataOffsetByComponents(2012-07-20, 2,1,1) will output 2014-08-21 DataOffsetByComponents(2012-07-20, -4,0,0) will output 2008-07-20 I will write another detailed blog on the new data functions shortly

Parallel Debugger:

DataStage 8.5 now has a built in debugger functionality. We can now set breakpoints on the links in our jobs. When the job is run in debug mode, it will stop when it encounters a breakpoint. From here we can step to the next action on that link or skip to the next row of data.

Functionality Enhancements:

- Mask encryption for before and after job subroutines - Ability to copy permissions from one project to a new project - Improvements in the multi-client manager - New audit tracing and enhanced exception dialog - Enhanced project creation failure details

Vertical Pivoting: Integration with CVS

At long last vertical pivoting has been added

Now in DataStage 8.5 we have the feature that integrates directly with version control systems like CVS. We can now Check-in and Check-out directly from DataStage

Information Architecture Diagraming Tool:

Now solution architects can draw detailed integration solution plans for data warehouses from within DataStage

Balanced Optimizer:

As you all know DataStage is an ETL tool. But now with Balanced Optimizer directly being integrated we have the ELT (Extract Load and Transform) feature. With this we can extract the data, load it and perform the transformations inside the database engine.

How did you handle reject data? Ans: Typically a Reject-link is defined and the rejected data is loaded back into data warehouse. So Reject link has to be defined every Output link you wish to collect rejected data. Rejected data is typically bad data like duplicates of Primary keys or null-rows where data is expected. If worked with DS6.0 and latest versions what are Link-Partitioner and Link-Collector used for? Ans: Link Partitioner - Used for partitioning the data. Link Collector - Used for collecting the partitioned data. What are Routines and where/how are they written and have you written any routines before? Ans: Routines are stored in the Routines branch of the DataStage Repository, where you can create, view or edit. The following are different types of routines: 1) Transform functions 2) Before-after job subroutines 3) Job Control routines What are OConv () and Iconv () functions and where are they used? Ans: IConv() - Converts a string to an internal storage format OConv() - Converts an expression to an output format. How did you connect to DB2 in your last project? Ans: Using DB2 ODBC drivers. Explain METASTAGE? Ans: MetaStage is used to handle the Metadata which will be very useful for data lineage and data analysis later on. Meta Data defines the type of data we are handling. This Data Definitions are stored in repository and can be accessed with the use of MetaStage. Do you know about INTEGRITY/QUALITY stage? Ans: Qulaity Stage can be integrated with DataStage, In Quality Stage we have many stages like investigate, match, survivorship like that so that we can do the Quality related works and we can integrate with datastage we need Quality stage plugin to achieve the task. Explain the differences between Oracle8i/9i? Ans: Oracle 8i does not support pseudo column sysdate but 9i supports Oracle 8i we can create 256 columns in a table but in 9i we can upto 1000 columns(fields) How do you merge two files in DS? Ans: Either use Copy command as a Before-job subroutine if the metadata of the 2 files are same or create a job to concatenate the 2 files into one if the metadata is different. What is DS Designer used for? Ans: You use the Designer to build jobs by creating a visual design that models the flow and transformation of data from the data source through to the target warehouse. The Designer

graphical interface lets you select stage icons, drop them onto the Designer work area, and add links. What is DS Administrator used for? Ans: The Administrator enables you to set up DataStage users, control the purging of the Repository, and, if National Language Support (NLS) is enabled, install and manage maps and locales. What is DS Director used for? Ans: datastage director is used to run the jobs and validate the jobs. we can go to datastage director from datastage designer it self. What is DS Manager used for? Ans: The Manager is a graphical tool that enables you to view and manage the contents of the DataStage Repository What are Static Hash files and Dynamic Hash files? Ans: As the names itself suggest what they mean. In general we use Type-30 dynamic Hash files. The Data file has a default size of 2Gb and the overflow file is used if the data exceeds the 2GB size. What is Hash file stage and what is it used for? Ans: Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI tables for better performance. How are the Dimension tables designed? Ans: Find where data for this dimension are located. Figure out how to extract this data. Determine how to maintain changes to this dimension. Change fact table and DW population routine

Potrebbero piacerti anche