Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Indexes
Ruihan Wang
Teradata Indexes
Introduction
Indexing is one of the most important features of the Teradata RDBMS. In the Teradata RDBMS, an index is used to define row uniqueness and retrieve data rows, it also can be used to enforce the primary key and unique constraints for a table. The Teradata RDBMS support five types of indexes: Unique Primary Index (UPI) Unique Secondary Index (USI) Non-Unique Primary Index (NUPI) Non-Unique Secondary Index (NUPI) Join Index The typical index contains two fields: a value and a pointer to instances of that value in a data table. Because the Teradata RDBMS uses hashing to distribute rows across the AMPs, the value is condensed into an entity called a row hash, which is used as the pointer. The row hash is not the value, but a mathematically transformed address. The Teradata RDBMS uses this transformed address as a retrieval index. The following rules apply to the indexes used in the Teradata Relation database: An index is a scheme used to distribute and retrieve rows of a data table. It can be based on the values in one or more columns of the table. A table can have a number of indexes, including one primary index, and up to 32 secondary indexes. An index for a relational table may be primary or secondary, and may be unique or non-unique. Each kind of index affects system performance, and can be important to data integrity. An index is usually defined on a table column whose values are frequently used in specifying WHERE constraints or join conditions. An index is used to enforce PRIMARY KEY and UNIQUE constraints. CREATE TABLE statement allows UNIQUE and PRIMARY Keys as defined constraints on a table, and each index may be given a name, which will allow the Teradata SQL statements refer to it.
teradata.uark.edu/research/wang/indexes.html 1/8
11/3/12
Primary Index
Primary index determines the distribution of table rows on the disks controlled by AMPs. In Teradata RDBMS, a primary index is required for row distribution and storage. When a new row is inserted, its hash code is derived by applying a hashing algorithm to the value in the column(s) of the primary code (as show in the following figure). Rows having the same primary index value are stored on the same AMP.
11/3/12
Indexes
First column The primary index values are stored in an integral part of the primary table. It should be based on the set selection most frequently used to access rows from a table and on the uniqueness of the value.
This request is process by hashing 1024 to do the following: Locate the AMP where the row is stored. Retrieve the row that contains a matching value in the hash code portion of its rowID. The Teradata RDBMS processes data most efficiently if table rows are uniformly distributed (hashed) across the AMPs on which they are stored.
Term Definition
Primary Key
Primary Index
A relational concept used to determine relationships among entities Used to store rows on and to define referential constraints disk
3/8
teradata.uark.edu/research/wang/indexes.html
Indexes
Not required, unless referential integrity checks are to be performed Define by CREATE TABLE statement Unique Identifies a row uniquely
Required Defined by CREATE TABLE statement Unique or non-unique Distributes rows Yes Yes Yes
Secondary Index
In addition to a primary index, up to 32 unique and non-unique secondary indexes can be defined for a table. Comparing to primary indexes, Secondary indexes allow access to information in a table by alternate, less frequently used paths. A secondary index is a subtable that is stored in all AMPs, but separately from the primary table. The subtables, which are built and maintained by the system, contain the following; RowIDs of the subtable rows Base table index column values RowIDs of the base table rows (points) As shown in the following figure, the secondary index subtable on each AMP is associated with the base table by the rowID.
teradata.uark.edu/research/wang/indexes.html
4/8
11/3/12
Indexes
11/3/12
Indexes
In this example, the request is sent to AMP n, which contains the rowID for the secondary index value "Education", this AMP, in turn, sends the request to AMP m, where the data row containing that value is stored. Note that the rowID and the data row may reside on the same AMP, in which case only one AMP is involved. A non-unique secondary index (NUSI) may have multiple rows per value. As a general rule, the NUSI should not be defined if the maximum number of rows per value exceeds the number of data blocks in the table. A NUSI is efficient only if the number of rows accessed is a small percentage of the total number of data rows in the table. It can be useful for complex conditional expressions or processing aggregates. For example, if the contact_name column is defined as a secondary index for the customer_service.contact table, the following statement can be processed by secondary index:
SLC *FO csoe_evc.otc EET RM utmrsriecnat WEEcnatnm ='ie; HR otc_ae Mk'
After request is submitted, the optimizer first will determine if it is faster to do a full-table scan of the base table rows or a full-table scan of the secondary index subtable to get the rowIDs of the qualifying base table rows; then place those rowIDs into a spool file; finally use the resulting rowIDs to access the base table rows. Non-unique secondary indexed accessed is used only for request processing when it is less costly than a complete table search.
Join Index
A join index is an indexing structure containing columns from multiple tables, specifically the resulting columns form one or more tables. Rather than having to join individual tables each time the join operation is needed, the query can be resolved via a join index and, in most cases, dramatically improve performance.
11/3/12
Indexes
Triggers A join index cannot be defined on a table with triggers. Collecting Statistics In general, there is no benefit in collecting statistics on a join index for joining columns specified in the join index definition itself. Statistics related to these columns should be collected on the underlying base table rather than on the join index.
11/3/12
Indexes
or to access data in a table depends on the following factors How the statement is structured. Whether current statistics exist for the table. Whether PRIMARY KEY or UNIQUE constraints need to be validated. Examples of using Index to process SQL statement and access data
teradata.uark.edu/research/wang/indexes.html
8/8