Sei sulla pagina 1di 12

1.

General Datastage issues


1.1. What are the ways to execute datastage jobs? A job can be run using a few different methods: from Datastage Director (menu Job -> Run now...) from command line using a dsjob command Datastage routine can run a job (DsRunJob command) by a job sequencer

1.2. How to invoke a Datastage shell command? Datastage shell commands can be invoked from : Datastage administrator (projects tab -> Command) Telnet client connected to the datastage server

1.3. How to stop a job when its status is running? To stop a running job go to DataStage Director and click the stop button (or Job -> Stop from menu). If it doesn't help go to Job -> Cleanup Resources, select a process with holds a lock and click Logout If it still doesn't help go to the datastage shell and invoke the following command: ds.tools It will open an administration panel. Go to 4.Administer processes/locks , then try invoking one of the clear locks commands (options 7-10). 1.4. How to run and schedule a job from command line? To run a job from command line use a dsjob command Command Syntax: dsjob [-file | [-server ][-user ][-password ]] [] The command can be placed in a batch file and run in a system scheduler 1.5. How to release a lock held by jobs? Go to the datastage shell and invoke the following command: ds.tools It will open an administration panel. Go to 4.Administer processes/locks , then try invoking one of the clear locks commands (options 7-10). 1.6. User privileges for the default DataStage roles? The role privileges are: DataStage Developer - user with full access to all areas of a DataStage project DataStage Operator - has privileges to run and manage deployed DataStage jobs -none- - no permission to log on to DataStage

1.7. What is a command to analyze hashed file?

There are two ways to analyze a hashed file. Both should be invoked from the datastage command shell. These are: FILE.STAT command ANALYZE.FILE command

1.8. Is it possible to run two versions of datastage on the same pc? Yes, even though different versions of Datastage use different system dll libraries. To dynamically switch between Datastage versions install and run DataStage MultiClient Manager. That application can unregister and register system libraries used by Datastage. 1.9. How to send notifications from Datastage as a text message (sms) to a cell phone

There is a few possible methods of sending sms messages from Datastage. However, there is no easy way to do this directly from Datastage and all methods described below will require some effort. The easiest way of doing that from the Datastage standpoint is to configure an SMTP (email) server as a mobile phone gateway. In that case, a Notification Activity can be used to send message with a job log and any desired details. DSSendMail Before-job or After-job subroutine can be also used to send sms messages. If configured properly, the recipients email address will have the following format: 600123456@oursmsgateway.com If there is no possibility of configuring a mail server to send text messages, you can to work it around by using an external application run directly from the operational system. There is a whole bunch of unix scripts and applications to send sms messages. In that solution, you will need to create a batch script which will take care of sending messages and invoke it from Datastage using ExecDOS or ExecSh subroutines passing the required parameters (like phone number and message body). Please keep in mind that all these solutions may require a contact to the local cellphone provider first and, depending on the country, it may not be free of charge and in some cases the provider may not support the capability at all. 2.

Datastage development and job design

2.1. Error in Link collector - Stage does not support in-process active-to-active inputs or outputs To get rid of the error just go to the Job Properties -> Performance and select Enable row buffer. Then select Inter process which will let the link collector run correctly. Buffer size set to 128Kb should be fine, however it's a good idea to increase the timeout.

2.2. What is the DataStage equivalent to like option in ORACLE The following statement in Oracle: select * from ARTICLES where article_name like '%WHT080%'; Can be written in DataStage (for example as the constraint expression): incol.empname matches '...WHT080...' 2.3. what is the difference between logging text and final text message in terminator stage Every stage has a 'Logging Text' area on their General tab which logs an informational message when the stage is triggered or started. Informational - is a green line, DSLogInfo() type message. The Final Warning Text - the red fatal, the message which is included in the sequence abort message

2.4. Error in STPstage - SOURCE Procedures must have an output link The error appears in Stored Procedure (STP) stage when there are no stages going out of that stage. To get rid of it go to 'stage properties' -> 'Procedure type' and select Transform 2.5. How to invoke an Oracle PLSQL stored procedure from a server job To run a pl/sql procedure from Datastage a Stored Procedure (STP) stage can be used. However it needs a flow of at least one record to run. It can be designed in the following way: source odbc stage which fetches one record from the database and maps it to one column - for example: select sysdate from dual A transformer which passes that record through. If required, add pl/sql procedure parameters as columns on the right-hand side of tranformer's mapping Put Stored Procedure (STP) stage as a destination. Fill in connection parameters, type in the procedure name and select Transform as procedure type. In the input tab select 'execute procedure for each row' (it will be run once).

Design of a DataStage server job with Oracle plsql procedure call

2.6. Is it possible to run a server job in parallel? Yes, even server jobs can be run in parallel. To do that go to 'Job properties' -> General and check the Allow Multiple Instance button. The job can now be run simultaneously from one or many sequence jobs. When it happens datastage will create new entries in Director and new job will be named with automatically generated suffix (for example second instance of a job named JOB_0100 will be named JOB_0100.JOB_0100_2). It can be deleted at any time and will be automatically recreated by datastage on the next run. 2.7. Error in STPstage - STDPROC property required for stage xxx The error appears in Stored Procedure (STP) stage when the 'Procedure name' field is empty. It occurs even if the Procedure call syntax is filled in correctly. To get rid of error fill in the 'Procedure name' field. 2.8. Datastage routine to open a text file with error catching Note! work dir and file1 are parameters passed to the routine. * open file1 OPENSEQ work_dir : '\' : file1 TO H.FILE1 THEN CALL DSLogInfo("******************** File " : file1 : " opened successfully", "JobControl") END ELSE CALL DSLogInfo("Unable to open file", "JobControl") ABORT END 2.9. Datastage routine which reads the first line from a text file Note! work dir and file1 are parameters passed to the routine. * open file1 OPENSEQ work_dir : '\' : file1 TO H.FILE1 THEN CALL DSLogInfo("******************** File " : file1 : " opened successfully", "JobControl") END ELSE CALL DSLogInfo("Unable to open file", "JobControl") ABORT END READSEQ FILE1.RECORD FROM H.FILE1 ELSE Call DSLogWarn("******************** File is empty", "JobControl") END firstline = Trim(FILE1.RECORD[1,32]," ","A") ******* will read the first 32 chars Call DSLogInfo("******************** Record read: " : firstline, "JobControl") CLOSESEQ H.FILE1

2.10. How to test a datastage routine or transform? To test a datastage routine or transform go to the Datastage Manager. Navigate to Routines, select a routine you want to test and open it. First compile it and then click 'Test...' which will open a new window. Enter test parameters in the left-hand side column and click run all to see the results. Datastage will remember all the test arguments during future tests. 2.11. When hashed files should be used? What are the benefits or using them? Hashed files are the best way to store data for lookups. They're very fast when looking up the key-value pairs. Hashed files are especially useful if they store information with data dictionaries (customer details, countries, exchange rates). Stored this way it can be spread across the project and accessed from different jobs. 2.12. How to construct a container and deconstruct it or switch between local and shared? To construct a container go to Datastage designer, select the stages that would be included in the container and from the main menu select Edit -> Construct Container and choose between local and shared. Local will be only visible in the current job, and share can be re-used. Shared containers can be viewed and edited in Datastage Manager under 'Routines' menu. Local Datastage containers can be converted at any time to shared containers in datastage designer by right clicking on the container and selecting 'Convert to Shared'. In the same way it can be converted back to local. 2.13. Corresponding datastage data types to ORACLE types? Most of the datastage variable types map very well to oracle types. The biggest problem is to map correctly oracle NUMBER(x,y) format. The best way to do that in Datastage is to convert oracle NUMBER format to Datastage Decimal type and to fill in Length and Scale column accordingly. There are no problems with string mappings: oracle Varchar2 maps to datastage Varchar, and oracle char to datastage char. 2.14. How to adjust commit interval when loading data to the database? In earlier versions of datastage the commit interval could be set up in: General -> Transaction size (in version 7.x it's obsolete) Starting from Datastage 7.x it can be set up in properties of ODBC or ORACLE stage in Transaction handling -> Rows per transaction. If set to 0 the commit will be issued at the end of a successfull transaction.

2.15. What is the use of INROWNUM and OUTROWNUM datastage variables?

@INROWNUM and @OUTROWNUM are internal datastage variables which do the following: @INROWNUM counts incoming rows to a transformer in a datastage job @OUTROWNUM counts oucoming rows from a transformer in a datastage job

These variables can be used to generate sequences, primary keys, id's, numbering rows and also for debugging and error tracing. They play similiar role as sequences in Oracle. 2.16. Datastage trim function cuts out more characters than expected By deafult datastage trim function will work this way: Trim(" a b c d ") will return "a b c d" while in many other programming/scripting languages "a b c d" result would be expected. That is beacuse by default an R parameter is assumed which is R - Removes leading and trailing occurrences of character, and reduces multiple occurrences to a single occurrence. To get the "a b c d" as a result use the trim function in the following way: Trim(" a b c d "," ","B") 2.17. Database update actions in ORACLE stage The destination table can be updated using various Update actions in Oracle stage. Be aware of the fact that it's crucial to select the key columns properly as it will determine which column will appear in the WHERE part of the SQL statement. Available actions: Clear the table then insert rows - deletes the contents of the table (DELETE statement) and adds new rows (INSERT). Truncate the table then insert rows - deletes the contents of the table (TRUNCATE statement) and adds new rows (INSERT). Insert rows without clearing - only adds new rows (INSERT statement). Delete existing rows only - deletes matched rows (issues only the DELETE statement). Replace existing rows completely - deletes the existing rows (DELETE statement), then adds new rows (INSERT). Update existing rows only - updates existing rows (UPDATE statement). Update existing rows or insert new rows - updates existing data rows (UPDATE) or adds new rows (INSERT). An UPDATE is issued first and if succeeds the INSERT is ommited. Insert new rows or update existing rows - adds new rows (INSERT) or updates existing rows (UPDATE). An INSERT is issued first and if succeeds the UPDATE is ommited. User-defined SQL - the data is written using a user-defined SQL statement. User-defined SQL file - the data is written using a user-defined SQL statement from a file.

2.18. Use and examples of ICONV and OCONV functions? ICONV and OCONV functions are quite often used to handle data in Datastage. ICONV converts a string to an internal storage format and OCONV converts an expression to an output format. Syntax: Iconv (string, conversion code) Oconv(expression, conversion ) Some useful iconv and oconv examples: Iconv("10/14/06", "D2/") = 14167 Oconv(14167, "D-E") = "14-10-2006" Oconv(14167, "D DMY[,A,]") = "14 OCTOBER 2006" Oconv(12003005, "MD2$,") = "$120,030.05" That expression formats a number and rounds it to 2 decimal places: Oconv(L01.TURNOVER_VALUE*100,"MD2") Iconv and oconv can be combined in one expression to reformat date format easily: Oconv(Iconv("10/14/06", "D2/"),"D-E") = "14-10-2006" 2.19. ERROR 81021 Calling subroutine DSR_RECORD ACTION=2

Error message: DataStage Repository Interface: Error calling subroutine: DSR_RECORD (Action=2); check DataStage is set up correctly in project Development (Internal Error (81021))

Datastage system help gives the following error desription: SYS.HELP. 081021 MESSAGE.. dsrpc: Error writing to Pipe. The problem appears when a job sequence is used and it contains many stages (usually more than 10) and very often when a network connection is slow. Basically the cause of a problem is a failure between DataStage client and the server communication.

The solution to the issue is: Do not log in to Datastage Designer using 'Omit' option on a login screen. Type in explicitly username and password and a job should compile successfully. execute the DS.REINDEX ALL command from the Datastage shell - if the above does not help

2.20. How to check Datastage internal error descriptions

To check the description of a number go to the datastage shell (from administrator or telnet to the server machine) and invoke the following command: SELECT * FROM SYS.MESSAGE WHERE @ID='081021'; - where in that case the number 081021 is an error number The command will produce a brief error description which probably will not be helpful in resolving an issue but can be a good starting point for further analysis.

2.21. Error timeout waiting for mutex The error message usually looks like follows: ... ds_ipcgetnext() - timeout waiting for mutex There may be several reasons for the error and thus solutions to get rid of it. The error usually appears when using Link Collector, Link Partitioner and Interprocess (IPC) stages. It may also appear when doing a lookup with the use of a hash file or if a job is very complex, with the use of many transformers. There are a few things to consider to work around the problem: - increase the buffer size (up to to 1024K) and the Timeout value in the Job properties (on the Performance tab). - ensure that the key columns in active stages or hashed files are composed of allowed characters get rid of nulls and try to avoid language specific chars which may cause the problem. - try to simplify the job as much as possible (especially if its very complex). Consider splitting it into two or three smaller jobs, review fetches and lookups and try to optimize them (especially have a look at the SQL statements). 2.22. ERROR 30107 Subroutine failed to complete successfully

Error message: Error calling subroutine:

Datastage system help gives the following error desription: SYS.HELP. 930107 MESSAGE.. DataStage/SQL: Illegal placement of parameter markers The problem appears when a project is moved from one project to another (for example when deploying a project from a development environment to production). The solution to the issue is: Rebuild the repository index by executing the DS.REINDEX ALL command from the Datastage shell

2.23. Datastage Designer hangs when editing job activity properties The appears when running Datastage Designer under Windows XP after installing patches or the Service Pack 2 for Windows. After opening a job sequence and navigating to the job activity properties window the application freezes and the only way to close it is from the Windows Task Manager. The solution of the problem is very simple. Just Download and install the XP SP2 patch for the Datastage client. It can be found on the IBM client support site (need to log in): https://www.ascential.com/eservice/public/welcome.do Go to the software updates section and select an appropriate patch from the Recommended DataStage patches section. Sometimes users face problems when trying to log in (for example when the license doesnt cover the IBM Active Support), then it may be necessary to contact the IBM support which can be reached at WDISupport@us.ibm.com 2.24. Can Datastage use Excel files as a data input? Microsoft Excel spreadsheets can be used as a data input in Datastage. Basically there are two possible approaches available: Access Excel file via ODBC - this approach requires creating an ODBC connection to the Excel file on a Datastage server machine and use an ODBC stage in Datastage. The main disadvantage is that it is impossible to do this on an Unix machine. On Datastage servers operating in Windows it can be set up here: Control Panel -> Administrative Tools -> Data Sources (ODBC) -> User DSN -> Add -> Driver do Microsoft Excel (.xls) -> Provide a Data source name -> Select the workbook -> OK Save Excel file as CSV - save data from an excel spreadsheet to a CSV text file and use a sequential stage in Datastage to read the data.

A Big Bunch of DataStage Complaints and Solutions


I recently got hold of a list of problems with DataStage from users who came from a Teradata ETI background. Some of these are DataStage idiosyncrasies that have a fix. I have stumbled over most of these problems and in some cases spent weeks trying to find work arounds. So here are the problems I had solutions for. The problem is in burgundy text and the solution in deep forest green, or is that aqua?: Sequences run sequences run sequences. When you have a multi-level execution like this it helps a great deal to have the offending sequence(s) listed in the root sequence that fire them off. This means that people dont have to waste time hunting through each of the logs to find where the problem occurred. Each sequence needs it own log or the log becomes too difficult to follow. The solution is to create a generic Sequence Job Check routine in DataStage BASIC that is run after every child sequence finishes. It retrieves all the warning and error messages from that child sequence and displays them in the parent sequence with the child job name as a prefix to the message. In previous projects I have used this technique to capture and email messages to support users. Dont abort sequences that are still valid just because a child sequence or job theyre using died. Whats the point of this? I dont want to have to go through umpteen sequences and reset the damn things. I just want to go to the offending one and fix it up. I like have a starter sequence job that can never abort as it doesnt wait for the child job to finish. It accepts a sequence job name, checks to see if that job is running, runs it and exits. It cant abort! That way the starter job that the third party scheduling tool uses is always in a runnable state and will return a meaningful message if the child job could not be started. Locked objects do not allow you to open them up. What is the point of this? At the very least let a person BROWSE the damn thing so they can see whats in it. Also, the panel that comes up offers absolutely no clue as to who has it open. Notify people who has it open so we know whats going on. This is all fixed in DataStage 8. You can open a locked job in read only mode! I reckon DataStage developers have been asking for this from release 1.0!!!! In version 7 I think it is a great idea to generated HTML documentation for your entire project every night. If a job is locked simply find the HTML version using desktop search or google desktop search. You will find a bitmap of the job that you can click on to jump to the job properties. This is faster than opening a job in DataStage. For more details read 10 Reasons why you should be generating HTML DataStage reports. Why is there a search function in Director and not in Designer? What a stupid thing to do.

Now hang on just a minute, the guy who architected DataStage is now an IBM Distinguished Engineer! The design is not so much stupid as pragmatically challenged. DataStage 8 fixes this problem with a new Quick Find and Advanced Find that looks for any type of object including jobs, columns and properties. Your solution here is to use the good old google desktop search (if you only want to view the job).

Environment variables...if I need to choose more than one, I dont want to have to open it up and hunt the variable down each time. Let me choose multiple ones at once. Also, let me sort them on variable name rather than description. Sometimes the description is meaningless or misleading.
I agree. Doesnt the environment property window suck? Two things alleviate this. 1) Use the Copy Job Utility from Ken Bland and Associates. See my post DataStage tip 1 of 10: hacking job parameters. 2) Hack the DSParams file in the DataStage Project home directory and manually add a folder structure to your new environment variables. This makes them a LOT easier to find. For this technique read my post DataStage tip: using job parameters without losing your mind. Why can we NOT open up multiple panels/panes? How much easier would it be if we could to allow us to compare variable lists, priority information and so on? When you are comparing two jobs it is much better to open two Designers. Like most Windows applications many of the property windows are modal and you can only open one of them at a time. Having two Designers lets you open the same property box in two jobs and compare the side by side or flip between them.DataStage 8 introduces a compare job function inside the Designer. Why cant I drag off the side of the screen when highlighting a long stream of stages within a job? You wouldnt believe how easy drag and drop becomes if you use the Diagram Zoom Out function. You can zoom out several times on a big job to see every stage in that job with space to spare around the edges. This makes drag and drop easy.

MODIFY statement. Why cant Specification be chosen from a pick list of variables that allows for multiple variable selection?
Im on a personal campaign to rid the world of unnecessary Modify stages. Use a Transformer first. If performance becomes a problem later on then replace it with a Modify stage. For more details read my post Is the DataStage parallel
transformer evil?

Copying jobs. Once a job has been copied and given its name, it cannot be copied again without an error. Why not allow this to take place but prefix the name with "Copy1Of..", "Copy2Of" and so forth?
I dont like copying jobs this way as you have to then rename them. If I am copying multiple times I open the job and keep using the Save As.. function as it lets me enter new job names as I go. Please add your own feedback or solutions to any of these problems. If you have a DataStage problem that you cannot get fixed in the Ascential ITToolbox forum or the dsxchange forum you can email it to me at websphereblog at gmail dot com.

Potrebbero piacerti anche