Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Teradata is a RDBMS (relational database management system) which includes below features:
It’s built on completely parallel architecture which means single task will be divided into
smaller chunks and compute simultaneously hence faster execution.
Teradata system is a shared-nothing architecture in which each node is independent and self-
sufficient. Also each logical processor (AMP) is responsible only for their own portion of
database.
Supporting industry standard ANSI SQL to communicate with Teradata.
Teradata database can be accessed by multiple concurrent users from different client
applications via popular TCP/IP connection or IBM mainframe channel connection.
The main components of Teradata Architecture are PE(Parsing Engine), BYNET, AMP(Access
Module Processor), Virtual Disk. Following is the logical view of the architecture:
Parsing
Engine
When a user fires an SQL query it first gets connected to the PE (Parsing Engine). The processes
such as planning and distributing the data to AMPS are done here. It finds out the best optimal plan
for query execution. The following are the processes performed by PE:
Parser: The Parser checks for the syntax, if true forward the query to Session Handler.
Session Handler: it does all the security checks, such as checking of logging credentials and
whether the user has permission to execute the query or not.
Optimizer: It finds out the best possible and optimized plan to execute the query.
Dispatcher: The Dispatcher forwards the query to the AMPs.
BYNET
The BYNET acts as a channel between PE and AMPs. It acts as a communicator between the two.
There are two BYNETs in Teradata ‘BYNET 0’ and ‘BYNET 1’. But we refer them as single
BYNET system. The reason for having 2 BYNETs is:
1. If one BYNET fails, the second one can take its place.
2. When data is large, both BYNETs can be made functional which improves the communication
between PE and AMPs, thus fastening the process.
AMP
Access Module Processor is a virtual processor which is connected to PE via BYNET. Each AMP
has its own disk and is allowed to read and write in its OWN disk. This is called as ‘SHARED
NOTHING ARCHITECTURE’. When the query is fired , Teradata distributes the rows of table on
all the AMPs and when it calls for any data all AMPs work simultaneously to give back the data.
This is called PARALLELISM. The AMP executes any SQL requests in three steps
Disk
Teradata offers a set of Virtual Disks for each AMP. The storage area of each AMP is called as
Virtual Disk or Vdisk. The steps for executing the query are below:
Teradata MultiLoad or MLoad is a command driven load utility for fast, high volume data
maintenance on multiple tables or views in Teradata database.
Why it is called Multi-load?
MultiLoad can perform multiple DML operations, including INSERT, UPDATE, DELETE, and
UPSERT on up to five (5) empty/populated target tables at the same time. In case of Fastload, it
loads the data to only one target table which should be empty.
MultiLoad Modes:-
MultiLoad Import-
Each MultiLoad import task can perform multiple INSERT, UPDATE, DELETE and UPSERT
operation on five target tables in parallel.
Can import data from the network attached system or mainframe attached system using custom
access module.
MultiLoad DELETE-
Each MultiLoad delete can remove large volumes of data from a single table.
The MultiLoad DELETE is used to perform global (all amps) delete operation on just one table.
The main features of this delete is that it bypasses the transient journal (TJ) and can be restarted if
it fails prior to finishing. We can’t use primary index in the MultiLoad delete operation as primary
index access is AMP specific, but MultiLoad delete is built for global delete.
Limitations of MultiLoad:-
No Unique Secondary Index- MultiLoad does not support unique secondary index (USI) same as
Fastload as in case of USI subtable may create in different AMP where the actual row resides. But
unlike FastLoad, it supports non-unique secondary index (NUSI) as in this case suitable will be
created in the same AMP. In MultiLoad, each AMP works in parallel and independently. This is
the reason; it supports NUSI but not USI.
No Referential Integrity- Referential Integrity (RI) on the target table not supported by the
Teradata MultiLoad. RI requires too much system checking to maintain the referential constraints
to a different table.
Triggers are not allowed- Triggers are involved with the more than one table. MultiLoad deals with
the only one target table. Simply ALTER the Triggers to the DISABLED status prior to using
MultiLoad.
No concatenation of the input files- It could impact the restart process if the files were
concatenated in different sequences or data was deleted between runs.
No Aggregate, exponential operator or arithmetic functions- If you need data conversions or math,
you might be better off using an INMOD to prepare the data prior to loading it.
Also note that MultiLoad does not support SELECT, Foreign key references, Hash indexes, Join
indexes, NOPI tables.
MultiLoad IMPORT has five phases.
Phase 1: Preliminary Phase
Parses and validates all Teradata MultiLoad commands and Teradata SQL statements.
Establishes MultiLoad sessions with the Teradata database. The default is the number of
available AMPs. For a small system the general thumb rule is: number of AMPs + two. These
two extra control sessions are for handling the SQL and logging. In case of larger system with
hundreds of AMPs, the SESSIONS option is available to lower the default.
Creates all support tables, i.e. log table(s), error tables, work table(s).
Apply utility Lock to the target tables.
All the DML statements are parsed and send to the appropriate worktable for each target table.
Later, during the acquisition phase data will also be stored in the worktable so it may be
applied in the application phase.
Acquires load locks to the target tables and views in the Teradata database.
Each block in the worktable is read once and applies the DML statement like INSERT,
UPDATE or DELETE and write the block of data to the actual target table.
A checkpoint will be created after each successful write of data block which will help to
RESTART the process from the fail point.
Any error will be written to the proper error table.
Forces an automatic restart/rebuild if an AMP went offline and came back online during the
application phase
Releases all locks on the target tables and views
Drops the temporary work tables and all empty error tables from Teradata Database
Reports the transaction statistics associated with the import and delete
Section 1- In this section, we need to specify the log table name which will be useful during
restarts process. You can specify the same as the target table name or some other database as well.
Section 2- We need to provide logon information like tdpid or ip_address of Teradata system,
username, password.
Section 3- In this section you must tell Teradata which tables to use. To do this, you use the
.BEGIN IMPORT MLOAD command. Optionally, you can provide the name of the error tables
and work table. By default, MultiLoad will create these tables automatically.
Section 4- You need to inform MultiLoad about the structure of source file using .LAYOUT
command. An asterisk is placed between column name and data type to calculate the next byte in
the record.
Section 5- The .DML LABEL names and defines the SQL that is to execute. It is like setting up
executable code in a programming language, but using SQL.
Section 6- We need to specify here the input file path and the file name. Then we list format type
as VERTEXT, LAYOUT to apply and the separator used in the source file.
Section 7- Teradata ends the MLoad here and logoff from the Teradata system.