Teradata Architecture

Teradata Architecture: Teradata is a massively parallel processing system running a shared nothing architecture.
Symmetric multiprocessing (SMP) A single node that contains multiple CPUs sharing a memory pool. Massively parallel processing (MPP) Multiple SMP nodes working together comprise a larger configuration. The nodes are connected using the BYNET, which allows multiple virtual processors on multiple system nodes to communicate with each other. Shared Nothing Architecture(MPP) means that each vproc (Access Module Processors and Parsing Engines are Virtual processors) is responsible for its own portion of the database and do not share common components.each AMP manages its own dedicated memory space and the data on its own vdisk these are not shared with other AMPs. Each AMP uses system resources independently of the other AMPs so they can all work in parallel for high system performance overall A node is made up of various hardware and softwares A clique is a set of Teradata nodes that share a common set of disk arrays. Cabling a subset of nodes to the same disk arrays creates a clique. A disk array is a configuration of disk drives that utilizes specialized controllers to manage and distribute data and parity acroos the disks while providing fast access and data integrity RAID 5 Data and parity protection striped across multiple disks RAID 1 Each disk has a physical mirror replicating the data. VPROCs: Teradata software components are known as "Virtual Processors" or VPROCs VPROCs are software threads or processes and there are two kinds of VPROCs: 1. Access Module Processors (AMPs) An AMP reads, writes and manipulates all database rows in the partition that the AMP owns 2. Parsing Engines (PEs) PE parse SQL statements, reducing them to their component executable steps The number of VPROCs is configurable VPROCs are in every Teradata node VPROCs can migrate around the complex, as in the case of a failed node VPROCS provide parallelism within a node
A Logical View of the Teradata Architecture
Teradata Storage Process The Parsing Engine interprets the SQL command and converts the data record from the host into an AMP message The BYNET distributes the row to the appropriate AMP The AMP formats the row and writes it to its associated disks The disk holds the row for subsequent access Teradata Retrieval Process The Parsing Engine dispatches a request to retrieve one or more rows The BYNET ensures that appropriate AMP(s) are activated The AMPs locate and retrieve desired rows in parallel access and will sort, aggregate or format if needed The BYNET returns retrieved rows to parsing engine The Parsing Engine returns row(s) to requesting client application The BYNET is responsible for Point-to-point communications between nodes and virtual processors Merging answer sets back to the PE making Teradata parallelism possible The Parsing Engine is responsible for Managing Individual sessions (up to 120) Parsing and optimizing SQL requests
Dispatching the optimized plan to the AMPs Sending the answer set response back to the requesting client The AMPs is responsible for Storing and retrieving rows to and from the disks Lock Management Sorting rows and aggregating columns Join Processing Output conversions and formatting Creating answer sets for clients Disk space management and accounting Introduction to Teradata RDBMS Teradata RDBMS is a complete relational database management system. The system is based on off-the-shelf Symmetric Multiprocessing (SMP) technology combined with a communication network connecting the SMP systems to form a Massively Parallel Processing (MMP) system. BYNET is a hardware inter-processor network to link SMP nodes. All processors in a same SMP node are connected by a virtual BYNET. We use the following figure to explain how each component in this DBMS works together.
PDE (Parallel Database Extensions): This component is an interface layer on the top of operating system. Its functions
include: executing vprocs (virtualprocessors), providing a parallel environment, scheduling sessions, debugging, etc. Teradata File System: It allows Teradata RDBMS to store and retrieve data regardless of low-level operating system interface. PE (Parsing Engine): Whenever a user login to Teradata it actually connect to Parsing Engine (PE). When a user submits query, then the PE takes action, it creates a plan and instruct AMPs what to do in order to get the result from the query. The PE knows all, it knows how many AMPs are connected to Teradata system, how many rows are in the table and what is the best possible plan to execute the query. This is why the PE is also called as the 'OPTIMIZER'. Beside making a perfect plan for query execution PE also make a check on the access right of the user weather the user has the privileges to execute the query or not. In this way PE also perform security feature on the users. Communicate with client Manage sessions Parse SQL statements Communicate with AMPs Return result to the client AMP (Access Module Processor): Each AMP attached to the Terdata system listens to the PE via the BYNET for instructions. Each AMP is connected to its own disk and has the privilege to read or write the data to its disk. The AMP can be best considered as the computer processor with its own disk attached to it. Whenever it recives the instructions from the PE it fetches the data from its disk and sends it to back to PE trough BYNET. Each AMP is allowed to read and write in its won disk ONLY. This is known as the 'SHARED NOTHING ARCHITECTURE'. Teradata spreads the rows of the table evenly across all the AMPs, when PE asks for data all AMPs work simultaneously and read the records from its own DISK. Hence a query will be as slow as the slowest AMP in the system. This is known as parallelism. BYNET interface Manage database Interface to disk subsystem BYNET: The BYNET is the communication channel between PE and AMP. It ensures that the communication between PE and AMP is correct and on right track. In Teradata system there are always two BYNET system. They are called as 'BYNET 0' and 'BYNET 1'. But we refer them as a single BYNET system. The reason two BYNET exist on a Teradata system is that 1) If one BYNET fails, the second BYNET takes over it place. 2) Two BYNET improves the performance of the system, the PE and AMP can talk to each talk to each other over both BYNET which fasten the communication. CLI (Call Level Interface): A SQL query is submitted and transferred in CLI packet format
TDP (Teradata Director Program): Route the packets to the specified Teradata RDBMS server Short Summary: The PE checks the syntax of the query, check the user security rights Then PE comes up with the best optimized plan for the execution of the query The PE passes this plan through BYNET to AMPs. The AMPs follow the plan and retrieve the data from its DISK. Then AMPs passes the data to PE through BYNET. The PE then passes the data to the user. Teradata RDBMS has the following components that support all data communication management: _ Call Level Interface ( CLI ) _ WinCLI & ODBC _ Teradata Director Program ( TDP for channel attached client ) _ Micro TDP ( TDP for network attached client )

Teradata Architecture

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Teradata Architecture

Caricato da

Copyright:

Formati disponibili

Teradata Architecture: Teradata is a massively parallel processing system running a shared nothing architecture.

A Logical View of the Teradata Architecture

Potrebbero piacerti anche