Sei sulla pagina 1di 7

What is Teradata?

Teradata is a RDBMS (relational database management system) which includes below features:

 It’s built on completely parallel architecture which means single task will be divided into
smaller chunks and compute simultaneously hence faster execution.
 Teradata system is a shared-nothing architecture in which each node is independent and self-
sufficient. Also each logical processor (AMP) is responsible only for their own portion of
database.
 Supporting industry standard ANSI SQL to communicate with Teradata.
 Teradata database can be accessed by multiple concurrent users from different client
applications via popular TCP/IP connection or IBM mainframe channel connection.

Why to use Teradata?


There are numerous reasons why clients choose Teradata over other databases.

 Linear scalability helps to support more users/data/queries/query complexity without losing


performance. When system configuration grows performance increases linearly.
 System is built on open architecture, so whenever any faster chip and device are made
available it can be incorporated into the already build architecture.
 Automatic distribution of data across multiple processors (AMP) evenly. Components divide
task into approximately equal pieces so all parts of the system are kept busy to accomplish the
task faster.
 Supports 50+ petabytes of data.
 Provides a parallel-aware Optimizer that makes query tuning unnecessary and get it run
efficiently.
 Single operation view for a large Teradata multi-node system via SWS (Service Workstation).
This is mainly managed by Teradata GSC.
 Single point of control for the DBA to manage the database using Teradata viewpoint.
 Compatible with large numbers of BI tool to fetch data.

The main components of Teradata Architecture are PE(Parsing Engine), BYNET, AMP(Access
Module Processor), Virtual Disk. Following is the logical view of the architecture:
Parsing
Engine
When a user fires an SQL query it first gets connected to the PE (Parsing Engine). The processes
such as planning and distributing the data to AMPS are done here. It finds out the best optimal plan
for query execution. The following are the processes performed by PE:

 Parser: The Parser checks for the syntax, if true forward the query to Session Handler.
 Session Handler: it does all the security checks, such as checking of logging credentials and
whether the user has permission to execute the query or not.
 Optimizer: It finds out the best possible and optimized plan to execute the query.
 Dispatcher: The Dispatcher forwards the query to the AMPs.

BYNET
The BYNET acts as a channel between PE and AMPs. It acts as a communicator between the two.
There are two BYNETs in Teradata ‘BYNET 0’ and ‘BYNET 1’. But we refer them as single
BYNET system. The reason for having 2 BYNETs is:

1. If one BYNET fails, the second one can take its place.
2. When data is large, both BYNETs can be made functional which improves the communication
between PE and AMPs, thus fastening the process.

AMP
Access Module Processor is a virtual processor which is connected to PE via BYNET. Each AMP
has its own disk and is allowed to read and write in its OWN disk. This is called as ‘SHARED
NOTHING ARCHITECTURE’. When the query is fired , Teradata distributes the rows of table on
all the AMPs and when it calls for any data all AMPs work simultaneously to give back the data.
This is called PARALLELISM. The AMP executes any SQL requests in three steps

1. Lock the table.


2. Execute the operation requested.
3. End the transaction.

Disk
Teradata offers a set of Virtual Disks for each AMP. The storage area of each AMP is called as
Virtual Disk or Vdisk. The steps for executing the query are below:

 The user fires the query which is sent to PE.


 PE does the security and syntax checks, and finds out the best optimal plan to execute the
query.
 The table rows are distributed on the AMP and the data is retrieved from the disk.
 The AMP sends back the data through BYNET to PE.
 PE returns back the data to the user.

Teradata MultiLoad or MLoad is a command driven load utility for fast, high volume data
maintenance on multiple tables or views in Teradata database.
Why it is called Multi-load?
MultiLoad can perform multiple DML operations, including INSERT, UPDATE, DELETE, and
UPSERT on up to five (5) empty/populated target tables at the same time. In case of Fastload, it
loads the data to only one target table which should be empty.

MultiLoad Modes:-
MultiLoad Import-

 Each MultiLoad import task can perform multiple INSERT, UPDATE, DELETE and UPSERT
operation on five target tables in parallel.
 Can import data from the network attached system or mainframe attached system using custom
access module.

MultiLoad DELETE-

 Each MultiLoad delete can remove large volumes of data from a single table.

The MultiLoad DELETE is used to perform global (all amps) delete operation on just one table.
The main features of this delete is that it bypasses the transient journal (TJ) and can be restarted if
it fails prior to finishing. We can’t use primary index in the MultiLoad delete operation as primary
index access is AMP specific, but MultiLoad delete is built for global delete.

How MultiLoad DELETE works and why it is fast?


MLoad DELETE operation checks for the entire block of data and after deleting the eligible rows,
it will write back the entire block once and one checkpoint will be written. In case of failure before
finishing, during restart, it will start deleting from the next data block without checkpoints.
In conventional delete method Teradata uses Transient Journal. In case of failure, all deleted rows
are put back into the table from the TJ as a rollback. This rollback can take longer time to finish
than delete. MultiLoad delete does not do roll back; it does restart.

BLOCK Level operation:-


Same as FastLoad, Teradata MultiLoad uses block level operation to overcome the I/O bottleneck.
It packs the data into 64KB block into the client system and sends to one AMP to write them to
disks to the AMPs. This is much faster than witting one row at time like BTEQ.
For fallback protected table’s data are being loaded in the background once the base table has been
loaded.

Data conversion capabilities:-


If an input data field with character data type is targeted for a column with date data type
specification, Teradata MultiLoad can change the input data specification to date before inserting
it into the target table.
Below are the conversions supported by MLoad-

 Numeric-to-numeric (for example, integer to decimal)


 Character-to-numeric
 Character-to-date
 Date-to-character

Limitations of MultiLoad:-
No Unique Secondary Index- MultiLoad does not support unique secondary index (USI) same as
Fastload as in case of USI subtable may create in different AMP where the actual row resides. But
unlike FastLoad, it supports non-unique secondary index (NUSI) as in this case suitable will be
created in the same AMP. In MultiLoad, each AMP works in parallel and independently. This is
the reason; it supports NUSI but not USI.
No Referential Integrity- Referential Integrity (RI) on the target table not supported by the
Teradata MultiLoad. RI requires too much system checking to maintain the referential constraints
to a different table.
Triggers are not allowed- Triggers are involved with the more than one table. MultiLoad deals with
the only one target table. Simply ALTER the Triggers to the DISABLED status prior to using
MultiLoad.
No concatenation of the input files- It could impact the restart process if the files were
concatenated in different sequences or data was deleted between runs.
No Aggregate, exponential operator or arithmetic functions- If you need data conversions or math,
you might be better off using an INMOD to prepare the data prior to loading it.
Also note that MultiLoad does not support SELECT, Foreign key references, Hash indexes, Join
indexes, NOPI tables.
MultiLoad IMPORT has five phases.
Phase 1: Preliminary Phase

 Parses and validates all Teradata MultiLoad commands and Teradata SQL statements.
 Establishes MultiLoad sessions with the Teradata database. The default is the number of
available AMPs. For a small system the general thumb rule is: number of AMPs + two. These
two extra control sessions are for handling the SQL and logging. In case of larger system with
hundreds of AMPs, the SESSIONS option is available to lower the default.
 Creates all support tables, i.e. log table(s), error tables, work table(s).
 Apply utility Lock to the target tables.

Phase 2: DML Transaction phase

 All the DML statements are parsed and send to the appropriate worktable for each target table.
Later, during the acquisition phase data will also be stored in the worktable so it may be
applied in the application phase.

Phase 3: Acquisition Phase


 MultiLoad will now start importing unsorted data from the source in the form of 64K data
blocks and send it to the AMPs.
 Teradata does not care about which AMP receives the data blocks. As soon as AMP receives
data block, it will start examining each row and send it to the proper AMP using hash
algorithm. In this moment, data will store into the worktables of the destination AMP.
 There will no Acquisition phase for MultiLoad DELETE operation.

Phase4: Application Phase

 Acquires load locks to the target tables and views in the Teradata database.
 Each block in the worktable is read once and applies the DML statement like INSERT,
UPDATE or DELETE and write the block of data to the actual target table.
 A checkpoint will be created after each successful write of data block which will help to
RESTART the process from the fail point.
 Any error will be written to the proper error table.

Phase 5: Clean up Phase

 Forces an automatic restart/rebuild if an AMP went offline and came back online during the
application phase
 Releases all locks on the target tables and views
 Drops the temporary work tables and all empty error tables from Teradata Database
 Reports the transaction statistics associated with the import and delete

A Sample Teradata MultiLoad script


/*Section 1*/
.LOGTABLE TERADATA.STUDENT_log;
/*Section 2*/
.logon IP_Address/username,password;
/*Section 3*/
.BEGIN IMPORT MLOAD
TABLES TERADATA.STUDENT
WORKTABLES TERADATA.STUDENT_log_wt
ERRORTABLES TERADATA.STUDENT_log_et
TERADATA.STUDENT_log_uv
AMPCHECK ALL;
/*Section 4*/
.LAYOUT STUDENT_SRC;
.FIELD ROLL_NO * VARCHAR(20) ;
.FIELD FIRST_NM * VARCHAR(20) ;
.FIELD LAST_NM * VARCHAR(20) ;
/*Section 5*/
.DML LABEL Insert_Add;
INSERT INTO TERADATA.STUDENT
(
ROLL_NO = :ROLL_NO
,FIRST_NM = :FIRST_NM
,LAST_NM = :LAST_NM
);
/*Section 7*/
.IMPORT INFILE C:\Student_data\Student_info LAYOUT
STUDENT_SRC FORMATVARTEXT ‘|’ APPLY Insert_Add;
/*Section 8*/
.END MLOAD;
.LOGOFF;

Section 1- In this section, we need to specify the log table name which will be useful during
restarts process. You can specify the same as the target table name or some other database as well.

Section 2- We need to provide logon information like tdpid or ip_address of Teradata system,
username, password.

Section 3- In this section you must tell Teradata which tables to use. To do this, you use the
.BEGIN IMPORT MLOAD command. Optionally, you can provide the name of the error tables
and work table. By default, MultiLoad will create these tables automatically.

Section 4- You need to inform MultiLoad about the structure of source file using .LAYOUT
command. An asterisk is placed between column name and data type to calculate the next byte in
the record.

Section 5- The .DML LABEL names and defines the SQL that is to execute. It is like setting up
executable code in a programming language, but using SQL.

Section 6- We need to specify here the input file path and the file name. Then we list format type
as VERTEXT, LAYOUT to apply and the separator used in the source file.
Section 7- Teradata ends the MLoad here and logoff from the Teradata system.

Potrebbero piacerti anche