Sei sulla pagina 1di 21

Teradata Utilities: TPump

Reprinted for KV Satish Kumar, IBM kvskumar@in.ibm.com Reprinted with permission as a subscription benefit of Books24x7, http://www.books24x7.com/

Table of Contents
Chapter 5: TPump ............................................................................................................................1 Overview................................................................................................................................1 Why it is Called "TPump".................................................................................................1 TPump Has Many Unbelievable Abilities...............................................................................1 TPump Has Some Limits.................................................................................................2 Supported Input Formats.......................................................................................................3 TPump Commands and Parameters.....................................................................................3 LOAD Parameters IN COMMON with MultiLoad...................................................................3 .BEGIN LOAD Parameters UNIQUE to TPump.....................................................................4 TPUMP Example...................................................................................................................5 Creating a Flatfile for our Tpump Job to Utilize ......................................................................5 Creating a Tpump Script........................................................................................................6 Executing the Tpump Script...................................................................................................8 TPump Script with Error Treatment Options........................................................................12 A TPump Script that Uses Two Input Data Files ..................................................................13 A TPump UPSERT Sample Script.......................................................................................15 Monitoring TPump ................................................................................................................16 Handling Errors in TPump Using the Error Table................................................................16 One Error Table.............................................................................................................16 Common Error Codes and What They Mean .......................................................................17 RESTARTing TPump...........................................................................................................18 TPump and MultiLoad Comparision Chart...........................................................................18

Chapter 5: TPump
"Diplomacy is the art of saying "Nice Doggie" until you can find a rock." Will Rogers

Overview
The chemistry of relationships is very interesting. Frederick Buechner once stated, "My assumption is that the story of any one of us is in some measure the story of us all." In this chapter, you will find that TPump has similarities with the rest of the family of Teradata utilities. But this newer utility has been designed with fewer limitations and many distinguishing abilities that the other load utilities do not have. Do you remember the first Swiss Army knife you ever owned? Aside from its original intent as a compact survival tool, this knife has thrilled generations with its multiple capabilities. TPump is the Swiss Army knife of the Teradata load utilities. Just as this knife was designed for small tasks, TPump was developed to handle batch loads with low volumes. And, just as the Swiss Army knife easily fits in your pocket when you are loaded down with gear, TPump is a perfect fit when you have a large, busy system with few resources to spare. Let's look in more detail at the many facets of this amazing load tool.

Why it is Called "TPump"


TPump is the shortened name for the load utility Teradata Parallel Data Pump. To understand this, you must know how the load utilities move the data. Both FastLoad and MultiLoad assemble massive volumes of data rows into 64K blocks and then moves those blocks. Picture in your mind the way that huge ice blocks used to be floated down long rivers to large cities prior to the advent of refrigeration. There they were cut up and distributed to the people. TPump does NOT move data in the large blocks. Instead, it loads data one row at a time, using row hash locks. Because it locks at this level, and not at the table level like MultiLoad, TPump can make many simultaneous, or concurrent, updates on a table. Envision TPump as the water pump on a well; pumping in a very slow, gentle manner resulting in a steady trickle of water that could be pumped into a cup. But strong and steady pumping results in a powerful stream of water that would require a larger container. TPump is a data pump which, like the water pump, may allow either a trickle- feed of data to flow into the warehouse or a strong and steady stream. In essence, you may "throttle" the flow of data based upon your system and business user requirements. Remember, TPump is THE PUMP!

TPump Has Many Unbelievable Abilities


Just in Time: Transactional systems, such those implemented for ATM machines or Point-of-Sale terminals, are known for their tremendous speed in executing transactions. But how soon can you get the information pertaining to that transaction into the data warehouse? Can you afford to wait until a nightly batch load? If not, then TPump may be the utility that you are looking for! TPump allows the user to accomplish near real-time updates from source systems into the Teradata data warehouse. Throttle-switch Capability: What about the throttle capability that was mentioned above? With TPump you may stipulate how many updates may occur per minute. This is also called the statement rate. In fact, you may change the statement rate during the job, "throttling up" the rate with a higher number, or "throttling down" the number of updates with a lower one. An example: Having this capability, you might want to throttle up the rate during the period from 12:00 noon to 1:30 PM when most of the users have gone to lunch. You could then lower the rate when they return and begin running their business queries. This way, you need not have such clearly defined load windows, as the other utilities require. You can have TPump running in the background all the
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

time, and just control its flow rate. DML Functions: Like MultiLoad, TPump does DML functions, including INSERT, UPDATE and DELETE. These can be run solo, or in combination with one another. Note that it also supports UPSERTs like MultiLoad. But here is one place that TPump differs vastly from the other utilities: FastLoad can only load one table and MultiLoad can load five tables. But, when it pulls data from a single source, TPump can load more than 60 tables at a time! And the number of concurrent instances in such situations is unlimited. That's right, not 15, but unlimited for Teradata! Well OK, maybe by your computer. I cannot imagine my laptop running 20 TPumps, but Teradata does not care. How could you use this ability? Well, imagine partitioning a huge table horizontally into multiple smaller tables and then performing various DML functions on all of them in parallel. Keep in mind that TPump places no limit on the number of jobs that may be established. Now, think of ways you might use this ability in your data warehouse environment. The possibilities are endless. More benefits: Just when you think you have pulled out all of the options on a Swiss Army knife, there always seems to be just one more blade or tool you had not noticed. Similar to the knife, TPump always seems to have another advantage in its list of capabilities. Here are several that relate to TPump requirements for target tables. TPump allows both Unique and Non-Unique Secondary Indexes (USIs and NUSIs), unlike FastLoad, which allows neither, and MultiLoad, which allows just NUSIs. Like MultiLoad, TPump allows the target tables to either be empty or to be populated with data rows. Tables allowing duplicate rows (MULTISET tables) are allowed. Besides this, Referential Integrity is allowed and need not be dropped. As to the existence of Triggers, TPump says, "No problem!" Support Environment compatibility: The Support Environment (SE) works in tandem with TPump to enable the operator to have even more control in the TPump load environment. The SE coordinates TPump activities, assists in managing the acquisition of files, and aids in the processing of conditions for loads. The Support Environment aids in the execution of DML and DDL that occur in Teradata, outside of the load utility. Stopping without Repercussions: Finally, this utility can be stopped at any time and all of locks may be dropped with no ill consequences. Is this too good to be true? Are there no limits to this load utility? TPump does not like to steal any thunder from the other load utilities, but it just might become one of the most valuable survival tools for businesses in today's data warehouse environment.

TPump Has Some Limits


TPump has rightfully earned its place as a superstar in the family of Teradata load utilities. But this does not mean that it has no limits. It has a few that we will list here for you: Rule #1: No concatenation of input data files is allowed. TPump is not designed to support this. Rule #2: TPump will not process aggregates, arithmetic functions or exponentiation. If you need data conversions or math, you might consider using an INMOD to prepare the data prior to loading it. Rule #3: The use of the SELECT function is not allowed. You may not use SELECT in your SQL statements. Rule #4: No more than four IMPORT commands may be used in a single load task. This means that at most, four files can be directly read in a single run. Rule #5: Dates before 1900 or after 1999 must be represented by the yyyy format for the year portion of the date, not the default format of yy. This must be specified when you create the table. Any dates using the default yy format for the year are taken to mean 20th century years. Rule #6: On some network attached systems, the maximum file size when using TPump is 2GB. This is true for a computer running under a 32-bit operating system.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

Rule #7: TPump performance will be diminished if Access Logging is used. The reason for this is that TPump uses normal SQL to accomplish its tasks. Besides the extra overhead incurred, if you use Access Logging for successful table updates, then Teradata will make an entry in the Access Log table for each operation. This can cause the potential for row hash conflicts between the Access Log and the target tables.

Supported Input Formats


TPump, like MultiLoad, supports the following five format options: BINARY, FASTLOAD, TEXT, UNFORMAT and VARTEXT. But TPump is quite finicky when it comes to data format errors. Such errors will generally cause TPump to terminate. You have got to be careful! In fact, you may specify an Error Limit to keep TPump from terminating prematurely when faced with a data format error. You can specify a number (n) of errors that are to be tolerated before TPump will halt. Here is a data format chart for your reference: BINARY Each record is a 2-byte integer, n, that is followed by n bytes of data. A byte is the smallest address space you can have in Teradata. FASTLOAD This format is the same as Binary, plus a marker (X '0A' or X '0D') that specifies the end of the record. TEXT Each record has a variable number of bytes and is followed by an end of the record marker. UNFORMAT The format for these input records is defined in the LAYOUT statement of the MultiLoad script using the components FIELD, FILLER and TABLE. VARTEXT This is variable length text RECORD format separated by delimiters such as a comma. For this format you may only use VARCHAR, LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad LAYOUT. Note that two delimiter characters in a row denote a null value between them. Figure 6-1

TPump Commands and Parameters


Each command in TPump must begin on a new line, preceded by a dot. It may utilize several lines, but must always end in a semi-colon. Like MultiLoad, TPump makes use of several optional parameters in the .BEGIN LOAD command. Some are the same ones used by MultiLoad. However, TPump has other parameters. Let's look at each group.

LOAD Parameters IN COMMON with MultiLoad

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

PARAMETER ERRLIMIT errcount [errpercent]

CHECKPOINT (n) SESSIONS (n)

TENACITY SLEEP

WHAT IT DOES You may specify the maximum number of errors, or the percentage, that you will tolerate during the processing of a load job. The key point here is that you should set the ERRLIMIT to a number greater than the PACK number. The reason for this is that sometimes, if the PACK factor is a smaller number than the ERRLIMIT, the job will terminate, telling you that you have gone over the ERRLIMIT. When this happens, there will be no entries in the error tables. In TPump, the CHECKPOINT refers to the number of minutes, or frequency, at which you wish a checkpoint to occur. This is unlike Mulitload which allows either minutes or the number of rows. This refers to the number of SESSIONS that should be established with Teradata. TPump places no limit on the number of SESSIONS you may have. For TPump, the optimal number of sessions is dependent on your needs and your host computer (like a laptop). Tells TPump how many hours to try logging on when less than the requested number of sessions is available. Tells TPump how frequently, in minutes, to try establishing additional sessions on the system.

Figure 6-2

.BEGIN LOAD Parameters UNIQUE to TPump


This parameter identifies a database that will contain any macros utilized by TPump. Remember, TPump does not run the SQL statements by itself. It places them into Macros and executes those Macros for efficiency. NOMONITOR Use this parameter when you wish to keep TPump from checking either statement rates or update status information for the TPump Monitor application. PACK (n) Use this to state the number of statements TPump will "pack" into a multiple-statement request. Multi-statement requests improve efficiency in either a network or channel environment because it uses fewer sends and receives between the application and Teradata. RATE This refers to the Statement Rate. It shows the initial maximum number of statements that will be sent per minute. A zero or no number at all means that the rate is unlimited. If the Statement Rate specified is less than the PACK number, then TPump will send requests that are smaller than the PACK number. ROBUST ON/OFF ROBUST defines how TPump will conduct a RESTART. ROBUST ON means that one row is written to the Logtable for every SQL transaction. The downside of running TPump in ROBUST mode is that it incurs additional, and possibly unneeded, overhead. ON is the default. If you specify ROBUST OFF, you are telling TPump to utilize "simple" RESTART logic: Just start from the last successful CHECKPOINT. Be aware that if some statements are reprocessed, such as those processed after the last CHECKPOINT, then you may end up with extra rows in your error tables. Why? Because some of the statements in the original run may have found errors, in which case they would have recorded those errors in an error table. MACRODB <databasename>

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

SERIALIZE OFF/ON

You only use the SERIALIZE parameter when you are going to specify a PRIMARY KEY in the .FIELD command. For example, ".FIELD Salaryrate * DECIMAL KEY." If you specify SERIALIZE TPump will ensure that all operations on a row will occur serially. If you code "SERIALIZE", but do not specify ON or OFF, the default is ON. Otherwise, the default is OFF unless doing an UPSERT.

Figure 6-3

TPUMP Example
"Don't use a big word where a diminutive one will suffice." - Unknown Don't use a big utility where TPump will suffice. TPump is great when you just want to trickle information into a table at all times. Think of it as a water hose filling up a bucket. Instead of filling the bucket up a glass of water a time (Fastload), we can just trickle the information in using a hose (TPUMP). The great thing about Tpump is that like a pump we can trickle in data or we can fire hose it in. If users are not on the system then we want to crank up the fire hose. If users are on the system and many of them are accessing a table we should trickle in the rows. For our TPUMP exercise, let's create an empty table:

Now execute the script:

Creating a Flatfile for our Tpump Job to Utilize


"In order to be irreplaceable one must always be different." Coco Chanel

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

Tpump is irreplaceable because no other utility works like it. Tpump can also use flat files to populate a table. While the script is somewhat different compared to other utilities, TPUMPs structure isn't completely foreign. Let's create our flat file to populate our empty table

Now we can use the flat file to populate our table:

Creating a Tpump Script


"Acting is all about honesty. If you can fake that, you've got it made." George Burns George Burns wasn't a big fan of Teradata, because there's no way that one could fake his / her way through a TPUMP script. The following 2 slides will show a basic Tpump script and point out the important parts to that script: Let's create our TPUMP script

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

Executing the Tpump Script


"Unless you believe, you will not understand." Saint Augustine
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

After running through these utility exercises, Teradata is destined to make you as the reader a believer. Utilities such as TPUMP are very hard to grasp at first. But if you believe utilities work and continue to analyze them, enlightenment is just around the corner. Executing our new TPUMP script

Let's check out our new table:

Much of the TPump command structure should look quite familiar to you. It is quite similar to MultiLoad. In this example, the Student_Names table is being loaded with new data from the university's registrar. It will be used as an associative table for linking various tables in the data warehouse.

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

10

Sets Up a Logtable and then


/* This script inserts rows into a table called logs on with .RUN. student_names from a single file */ .LOGTABLE WORK_DB.LOG_PUMP; .RUN FILE C:\mydir\logon.txt; DATABASE SQL01; .BEGIN LOAD

The logon.txt file contains: .logon TDATA/SQL01,SQL01;. Also specifies the database to find the necessary tables. Begins the Load Process; Specifies optional parameters. Names the error table for this run.

ERRLIMIT 5 CHECKPOINT 1 SESSIONS 64 TENACITY 2 PACK 40 RATE 1000 ERRORTABLE SQL01.ERR_PUMP;


.LAYOUT FILELAYOUT; .FIELD Student_ID .FIELD Last_Name .FILLER More_Junk .FIELD First_Name * * * * INTEGER; CHAR(20); CHAR(20); CHAR(14);

Names the LAYOUT of the INPUT record;

Notice the dots before the .FIELD and .FILLER commands and the /* start comment - this could also be coded as: semi-colons after each .FIELD Student_ID * INTEGER; FIELD definition. Also, the .FIELD Last_Name * CHAR(20); more_junk field moves the .FIELD First_Name 45 CHAR(14); field pointer to the start of end of the comment */ the First_name data. Notice the comment in the script. Names the DML Label
.DML LABEL INSREC; INSERT INTO SQL01.Student_Names ( Student_ID ,Last_Name ,First_Name ) VALUES (:Student_ID ,:Last_Name ,:First_Name );

Tells TPump to INSERT a row into the target table and defines the row format; Comma separators are placed in front of the following column or value for easier debugging. Lists, in order, the VALUES to be INSERTed. Colons precede VALUEs. Names the IMPORT file;

.IMPORT INFILE CDW_import.txt FORMAT TEXT LAYOUT FILELAYOUT APPLY INSREC;

.END LOAD; .LOGOFF;


Reprinted for ibmkvskumar@in.ibm.com, IBM

Names the LAYOUT to be called from above; tells TPump which DML Label to APPLY. Tells TPump to stop loading and logs off all sessions.
Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

11

Figure 6-4 Step One: Setting up a Logtable and Logging onto Teradata First, you define the Logtable using the .LOGTABLE command. We have named it LOG_PUMP in the WORK_DB database. The Logtable is automatically created for you. It may be placed in any database by qualifying the table name with the name of the database by using syntax like this: <databasename>.<tablename> Next, the connection is made to Teradata. Notice that the commands in TPump, like those in MultiLoad, require a dot in front of the command key word. Step Two: Begin load process, add parameters, naming the Error Table Here, the script reveals the parameters requested by the user to assist in managing the load for smooth operation. It also names the one error table, calling it SQL01.ERR_PUMP. Now let's look at each parameter: ERRLIMIT 5 says that the job should terminate after encountering five errors. You may set the limit that is tolerable for the load. CHECKPOINT 1 tells TPump to pause and evaluate the progress of the load in increments of one minute. SESSIONS 64 tells TPump to establish 64 sessions with Teradata. TENACITY 2 says that if there is any problem establishing sessions, then to keep on trying for a period of two hours. PACK 40 tells TPump to "pack" 40 data rows and load them at one time. RATE 1000 means that 1,000 data rows will be sent per minute.

Step Three: Defining the INPUT flat file structure TPump, like MultiLoad, needs to know the structure the INPUT flat file record. You use the .LAYOUT command to name the layout. Following that, you list the columns and data types of the INPUT file using the .FIELD, .FILLER or .TABLE commands. Did you notice that an asterisk is placed between the column name and its data type? This means to automatically calculate the next byte in the record. It is used to designate the starting location for this data based on the previous field's length. If you are listing fields in order and need to skip a few bytes in the record, you can either use the .FILLER with the correct number of bytes as character to position to the cursor to the next field, or the "*" can be replaced by a number that equals the lengths of all previous fields added together plus 1 extra byte. When you use this technique, the .FILLER is not needed. In our example, this says to begin with Student_ID, continue on to load Last_Name, and finish when First_Name is loaded. Step Four: Defining the DML activities to occur At this point, the .DML LABEL names and defines the SQL that is to execute. It also names the columns receiving data and defines the sequence in which the VALUES are to be arranged. In our example, TPump is to INSERT a row into the SQL01.Student_NAMES. The data values coming in from the record are named in the VALUES with a colon prior to the name. This provides the PE with information on what substitution is to take place in the SQL. Each LABEL used must also be referenced in an APPLY clause of the .IMPORT clause. Step Five: Naming the INPUT file and defining its FORMAT Using the .IMPORT INFILE command, we have identified the INPUT data file as "CDW_import.txt". The file was created using the TEXT format.
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

12

Step Six: Associate the data with the description Next, we told the IMPORT command to use the LAYOUT called, "FILELAYOUT." Step Seven: Telling TPump to start loading Finally, we told TPump to APPLY the DML LABEL called INSREC that is, to INSERT the data rows into the target table. Step Seven: Finishing loading and logging off of Teradata The .END LOAD command tells TPump to finish the load process. Finally, TPump logs off of the Teradata system.

TPump Script with Error Treatment Options


/* !/bin/ksh* */
/* ++++++++++++++++++++++++++++++++++ /* TPUMP SCRIPT - CDW /*This script loads SQL01.Student_Profile4 */ /* Version 1.1 /* Created by Coffing Data Warehousing /* +++++++++++++++++++++++++++++++++++++ */ */ */ */ */

Load with a Shell Script Names and describes the purpose of the script; names the author. Sets up a Logtable and

/* Setup the TPUMP Logtables, Logon Statements and then logs on to Database Default */ Teradata. .LOGTABLE SQL01.LOG_PUMP; .LOGON CDW/SQL01,SQL01; DATABASE SQL01;

Specifies the database containing the table. BEGINS THE LOAD

/* Begin Load and Define TPUMP Parameters and Error PROCESS Tables */ .BEGIN LOAD SPECIFIES MULTIPLE ERRLIMIT 5 PARAMETERS TO AID CHECKPOINT 1 SESSIONS 1 IN PROCESS CONTROL TENACITY 2 PACK 40 NAMES THE ERRROR RATE 1000 TABLE; TPump HAS ERRORTABLE SQL01.ERR_PUMP;

.LAYOUT FILELAYOUT; .FIELD Student_ID .FIELD Last_Name .FIELD First_Name .FIELD Class_Code .FIELD Grade_Pt

* * * * *

VARCHAR VARCHAR VARCHAR VARCHAR VARCHAR

(11); (20); (14); (2); (8);

ONLY ONE ERROR TABLE. Names the LAYOUT of the INPUT file.

Defines the structure of the INPUT file; here, all Variable CHARACTER data and the file has a comma delimiter. See .IMPORT below for file type and the declaration of the delimiter. Names the DML Label; SPECIFIES 3 ERROR TREATMENT OPTIONS with the ; after the last option. Tells TPump to INSERT a row into the target table and defines the row format.

.DML LABEL INSREC IGNORE DUPLICATE ROWS IGNORE MISSING ROWS IGNORE EXTRA ROWS; INSERT INTO Student_Profile4 ( Student_ID ,Last_Name ,First_Name ,Class_Code ,Grade_Pt ) VALUES
Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

13

( :Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt );

Note that we place comma separators in front of the following column or value for easier debugging. Lists, in order, the VALUES to be INSERTed. A colon always precedes values.

.IMPORT INFILE FORMAT LAYOUT APPLY

Cdw_import.txt VARTEXT "," FILELAYOUT INSREC;

Names the IMPORT file; Names the LAYOUT to be called from above; Tells TPump which DML Label to APPLY. Notice the FORMAT with a comma in the quotes to define the delimiter between fields in the input record. Tells TPump to stop loading and Logs Off all sessions.

.END LOAD; .LOGOFF;

Figure 6-5

A TPump Script that Uses Two Input Data Files


Load Runs from a Shell
/* !/bin/ksh* /* ++++++++++++++++++++++++++++++++++ /* TPUMP SCRIPT using 2 Input Files CDW /* It loads STUDT_CONTACT Target Table CDW /*This script loads SQL01. Student_Profile3 */ /* Version 1.1 /* Created by Coffing Data Warehousing /* ++++++++++++++++++++++++++++++++++++++++ .LOGTABLE SQL01.LOG_TPMP; .LOGON CDW/SQL01,SQL01; DATABASE SQL01; */ Script

Names and describes


*/ the purpose of the */ script; names the */

author.

*/ */ */

Sets Up a Logtable and then logs on to Teradata. Specifies the database to work in (optional). Begins the load process Specifies multiple parameters to aid in load management Names the error table; TPump HAS ONLY ONE ERROR TABLE PER
Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

.BEGIN LOAD ERRLIMIT 5 CHECKPOINT 1 SESSIONS 1 TENACITY 2 PACK 40 RATE 1000 ERRORTABLE WORK_DB.ERR_TPMP ;

Reprinted for ibmkvskumar@in.ibm.com, IBM

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

14

.LAYOUT REC_LAYOUT1 INDICATORS; .FIELD Student_ID * INTEGER; .FIELD Last_name * CHAR(20); .FIELD First_name * VARCHAR(14); .FIELD Class_code * CHAR(2); .FIELD Grade_Pt * DECIMAL(8,2); .LAYOUT REC_LAYOUT2; .FILLER Rec_Type .FIELD Last_name .FIELD First_name .FIELD Student_ID .FIELD Class_code .FIELD Grade_Pt

TARGET TABLE Defines the LAYOUT for the 1st INPUT file also has the indicators for NULL data. Defines the LAYOUT for the 2nd INPUT file with a different arrangement of fields

* * * * * *

CHAR(1); CHAR(20); VARCHAR(14); INTEGER; CHAR(2); DECIMAL(8,2);

.DML LABEL INSREC1 IGNORE DUPLICATE ROWS IGNORE EXTRA ROWS; INSERT INTO Student_Profile_OLD ( Student_ID ,Last_Name ,First_Name ,Class_Code ,Grade_Pt ) VALUES ( :Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,: Grade_Pt );

Names the 1st DML Label and specifies 2 Error Treatment options. Tells TPump to INSERT a row into the target table and defines the row format. Lists, in order, the VALUES to be INSERTed. A colon always precedes values. Names the 2nd DML Label and specifies 1 Error Treatment options. Tells TPump to INSERT a row into the target table and defines the row format. Lists, in order, the VALUES to be INSERTed. A colon always precedes values. Names the TWO Import Files as FILE-REC1.DAT and FILE-REC2.DAT. The file name is under Windows so the "-"is fine. Names the TWO Layouts that define the structure of the INPUT DATA files; Names the TWO INPUT data files

.DML LABEL INSREC2 IGNORE DUPLICATE ROWS; INSERT INTO Student_Profile_NEW ( Student_ID ,Last_Name ,First_Name ,Class_Code ,Grade_Pt ) VALUES (:Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt );

.IMPORT INFILE FILE-REC1.DAT FORMAT FASTLOAD LAYOUT REC_LAYOUT1 APPLY INSREC1; .IMPORT INFILE FILE-REC2.DAT FORMAT TEXT LAYOUT REC_LAYOUT2 APPLY INSREC2 ;

.END LOAD;
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

15

.LOGOFF;

Tells TPump to stop loading and logs off all sessions.

Figure 6-7

A TPump UPSERT Sample Script


/* this is an UPSERT TPump script .LOGTABLE SQL01.CDW_LOG; .LOGON CDW/SQL01,SQL01; .BEGIN LOAD ERRLIMIT 5 CHECKPOINT 10 SESSIONS 10 TENACITY 2 PACK 10 RATE 10 ERRORTABLE SQL01.SWA_ET; .LAYOUT INREC INDICATORS; .FIELD StudentID * .FIELD Last_name * .FIELD First_name * .FIELD Class_code * .FIELD Grade_Pt * */

Sets Up a Logtable and then logs on to Teradata. Begins the load process Specifies multiple parameters to aid in load management Names the error table; TPump HAS ONLY ONE ERROR TABLE PER TARGET TABLE Defines the LAYOUT for the 1st INPUT file; also has the indicators for NULL data.

INTEGER; CHAR(20); VARCHAR(14); CHAR(2); DECIMAL(8,2);

.DML LABEL UPSERTER DO INSERT FOR MISSING UPDATE ROWS; UPDATE Student_Profile SET Last_Name = :Last_Name ,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_Pt WHERE Student_ID = :StudentID ; INSERT INTO Student_Profile VALUES ( :StudentID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt ); .IMPORT INFILE UPSERT-FILE.DAT FORMAT FASTLOAD LAYOUT INREC APPLY UPSERTER ; .END LOAD; .LOGOFF;

Names the 1st DML Label and specifies 2 Error Treatment options. Tells TPump to INSERT a row into the target table and defines the row format. Lists, in order, the VALUES to be INSERTed. A colon always precedes values.

Names the Import File as UPSERT-FILE.DAT. The file name is under Windows so the "-"is fine. The file type is FASTLOAD. Tells TPump to stop loading and logs off all sessions.

Figure 6-8 NOTE: The above UPSERT uses the same syntax as MultiLoad. This continues to work. However, there might soon be another way to accomplish this task. NCR has built an UPSERT and we have tested the following statement, without success:
UPDATE SQL01.Student_Profile SET Last_Name =:Last_Name
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

16

,First_Name = :First_Name ,Class_Code = :Class_Code ,Grade_Pt = :Grade_Pt WHERE Student_ID = :Student_ID; ELSE INSERT INTO SQL01.Student_Profile VALUES (:Student_ID ,:Last_Name ,:First_Name ,:Class_Code ,:Grade_Pt);

We are not sure if this will be a future technique for coding a TPump UPSERT, or if it is handled internally. For now, use the original coding technique.

Monitoring TPump
TPump comes with a monitoring tool called the TPump Monitor. This tool allows you to check the status of TPump jobs as they run and to change (remember "throttle up" and "throttle down?") the statement rate on the fly. Key to this monitor is the "SysAdmin.TpumpStatusTbl" table in the Data Dictionary Directory. If your Database Administrator creates this table, TPump will update it on a minute-by-minute basis when it is running. You may update the table to change the statement rate for an IMPORT. If you want TPump to run unmonitored, then the table is not needed. You can start a monitor program under UNIX with the following command:
tpumpmon [-h] [TDPID/] <UserName>,<Password> [,<AccountID>]

Below is a chart that shows the Views and Macros used to access the "SysAdmin.TpumpStatusTbl" table. Queries may be written against the Views. The macros may be executed. Views and Macros to access the table SysAdmin.TpumpStatusTbl View SysAdmin.TPumpStatus View SysAdmin.TPumpStatusX Macro Sysadmin.TPumpUpdateSelect Macro TPumpMacro.UserUpdateSelect Figure 6-9

Handling Errors in TPump Using the Error Table


One Error Table
Unlike FastLoad and MultiLoad, TPump uses only ONE Error Table per target table, not two. If you name the table, TPump will create it automatically. Entries are made to these tables whenever errors occur during the load process. Like MultiLoad, TPump offers the option to either MARK errors (include them in the error table) or IGNORE errors (pay no attention to them whatsoever). These options are listed in the .DML LABEL sections of the script and apply ONLY to the DML functions in that LABEL. The general default is to MARK. If you specify nothing, TPump will assume the default. When doing an UPSERT, this default does not apply. The error table does the following: Identifies errors Provides some detail about the errors
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

17

Stores a portion the actual offending row for debugging

When compared to the error tables in MultiLoad, the TPump error table is most similar to the MultiLoad Acquisition error table. Like that table, it stores information about errors that take place while it is trying to acquire data. It is the errors that occur when the data is being moved, such as data translation problems that TPump will want to report on. It will also want to report any difficulties compiling valid Primary Indexes. Remember, TPump has less tolerance for errors than FastLoad or Multiload. COLUMNS IN THE TPUMP ERROR TABLE ImportSeq Sequence number that identifies the IMPORT command where the error occurred DMLSeq Sequence number for the DML statement involved with the error SMTSeq Sequence number of the DML statement being carried out when the error was discovered ApplySeq Sequence number that tells which APPLY clause was running when the error occurred SourceSeq The number of the data row in the client file that was being built when the error took place DataSeq Identifies the INPUT data source where the error row came from ErrorCode System code that identifies the error ErrorMsg Generic description of the error ErrorField Number of the column in the target table where the error happened; is left blank if the offending column cannot be identified; This is different from MultiLoad, which supplies the column name. HostData The data row that contains the error, limited to the first 63,728 bytes related to the error Figure 6-10

Common Error Codes and What They Mean


TPump users often encounter three error codes that pertain to Missing data rows Duplicate data rows Extra data rows

Become familiar with these error codes and what they mean. This could save you time getting to the root of some common errors you could see in your future! #1: Error 2816: Failed to insert duplicate row into TPump Target Table. Nothing is wrong when you see this error. In fact, it can be a very good thing. It means that TPump is notifying you that it discovered a DUPLICATE row. This error jumps to life when one of the following options has been stipulated in the .DML LABEL: MARK DUPLICATE INSERT ROWS MARK DUPLICATE UPDATE ROWS
Reprinted for ibmkvskumar@in.ibm.com, IBM Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

18

Note that the original row will be inserted into the target table, but the duplicate row will not. #2: Error 2817: Activity count greater than ONE for TPump UPDATE/DELETE. Sometimes you want to know if there were too may "successes." This is the case when there are EXTRA rows when TPump is attempting an UPDATE or DELETE. TPump will log an error whenever it sees an activity count greater than zero for any such extra rows if you have specified either of these options in a .DML LABEL: MARK EXTRA UPDATE ROWS MARK EXTRA DELETE ROW

At the same time, the associated UPDATE or DELETE will be performed. #3: Error 2818: Activity count zero for TPump UPDATE or DELETE. Sometimes, you want to know if a data row that was supposed to be updated or deleted wasn't! That is when you want to know that the activity count was zero, indicating that the UPDATE or DELETE did not occur. To see this error, you must have used one of the following parameters: MARK MISSING UPDATE ROWS MARK MISSING DELETE ROWS

RESTARTing TPump
Like the other utilities, a TPump script is fully restartable as long as the log table and error tables are not dropped. As mentioned earlier you have a choice of setting ROBUST either ON (default) or OFF. There is more overhead using ROBUST ON, but it does provide a higher degree of data integrity, but lower performance.

TPump and MultiLoad Comparision Chart

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited

Teradata Utilities: BTEQ, FastLoad, MultiLoad, TPump, and FastExport, Second Edition

19

Function Error Tables must be defined

MultiLoad Optional, 2 per target table

Work Tables must be defined Optional, 1 per target table Logtable must be defined Yes Allows Referential Integrity No Allows Unique Secondary Indexes No Allows Non-Unique Secondary Yes Indexes Allows Triggers No Yes Loads a maximum of n number of Five 60 tables Maximum Concurrent Load 15 Unlimited Instances Locks at this level Table Row Hash DML Statements Supported INSERT, UPDATE, INSERT, UPDATE, DELETE, "UPSERT" DELETE, "UPSERT" How DML Statements are Runs actual DML Compiles DML into Performed commands MACROS and executes DDL Statements Supported All All Transfers data in 64K blocks Yes No, moves data at row level RESTARTable Yes Yes Stores UPI Violation Rows Yes, with MARK option Yes, with MARK option Allows use of Aggregated, No No Arithmetic calculations or Conditional Exponentiation Allows Data Conversion Yes Yes Performance Improvement As data volumes increase By using multi-statement requests Table Access During Load Uses WRITE lock on tables Allows simultaneous in Application Phase READ and WRITE access due to Row Hash Locking Effects of Stopping the Load Consequences No repercussions Resource Consumption Hogs available resources Allows consumption management via Parameters

TPump Optional, 1 per target table No Yes Yes Yes Yes

Figure 6-11

Reprinted for ibmkvskumar@in.ibm.com, IBM

Coffing Data Warehousing, Coffing Publishing (c) 2005, Copying Prohibited