0 valutazioniIl 0% ha trovato utile questo documento (0 voti)
22 visualizzazioni16 pagine
This document provides an overview of different types of index structures that can be used for database files. It discusses primary indexes, clustering indexes, secondary indexes, multilevel indexes, B-tree indexes, and B+tree indexes. Primary indexes are ordered on the primary key field and contain a pointer to each data block. Clustering indexes are ordered on a non-key field to cluster similar records together. Secondary indexes are dense indexes on a non-key field, containing a pointer to each record. Multilevel indexes use multiple levels to reduce search time. B-tree and B+tree indexes are balanced tree structures used to improve performance as files grow large.
This document provides an overview of different types of index structures that can be used for database files. It discusses primary indexes, clustering indexes, secondary indexes, multilevel indexes, B-tree indexes, and B+tree indexes. Primary indexes are ordered on the primary key field and contain a pointer to each data block. Clustering indexes are ordered on a non-key field to cluster similar records together. Secondary indexes are dense indexes on a non-key field, containing a pointer to each record. Multilevel indexes use multiple levels to reduce search time. B-tree and B+tree indexes are balanced tree structures used to improve performance as files grow large.
This document provides an overview of different types of index structures that can be used for database files. It discusses primary indexes, clustering indexes, secondary indexes, multilevel indexes, B-tree indexes, and B+tree indexes. Primary indexes are ordered on the primary key field and contain a pointer to each data block. Clustering indexes are ordered on a non-key field to cluster similar records together. Secondary indexes are dense indexes on a non-key field, containing a pointer to each record. Multilevel indexes use multiple levels to reduce search time. B-tree and B+tree indexes are balanced tree structures used to improve performance as files grow large.
Unit 4 Index Structures for Files Structure 4.1 Introduction Objectives Self Assessment Question(s) (SAQs) 4.2 Primary index Self Assessment Question(s) (SAQs) 4.3 Clustering Index Self Assessment Question(s) (SAQs) 4.4 Secondary index Self Assessment Question(s) (SAQs) 4.5 Multilevel indexes Self Assessment Question(s) (SAQs) 4.6 B+Tree Index Files Self Assessment Question(s) (SAQs) 4.7 B-tree Self Assessment Question(s) (SAQs) 4.8 Summary 4.9 Terminal Questions (TQs) 4.10 Multiple Choice Questions (MCQs) 4.11 Answers to SAQs, TQs, and MCQs 4.11.1 Answers to Self Assessment Questions (SAQs) 4.11.2 Answers to Terminal Questions (TQs) 4.11.3 Answers to Multiple Choice Questions (MCQs) 4.1 Introduction Indexes are used to speed up the retrieval of records. A database index is a data structure that improves the speed of operations in a table. Database Management Systems Unit 4 Sikkim Manipal University Page No.: 60 Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records. The disk space required to store the index is typically less than the storage of the table (since indexes usually contain only the key-fields according to which the table is to be arranged, and exclude all the other details in the table). Index file consists of two fields, the first field contains the value and second field contains the list of pointers to address values in the disk block With the help of the index, it can search the index to find the requested value and with the help of pointer it can easily locate the requested row of the table. Searching an index is much faster than searching the table because the index is sorted and its rows are very small. Index access structure is usually defined on a single field of a file, called an indexing field. There are several types of indexes: 1. Primary index 2. Clustering index 3. Secondary index 4. Multilevel index 5. B - Trees and b +trees. Objectives To know about Primary index Clustering index Secondary index Multilevel index B - trees and b+trees Database Management Systems Unit 4 Sikkim Manipal University Page No.: 61 Self Assessment Question(s) (SAQs) (for section 4.1) Why do you need index? List different types of indices. 4.2 Primary index A primary index is an index specified by ordering key field. It consists of two fields 1. First Field: This is of the same Data type as the primary key field of the data file 2. Second Field: Pointer to a disk block. For each block in the data file there is a corresponding index entry in the index file Each index entry has the value of the primary key field for the first record in the block and pointer to that block. Performing the search operation on the index can be done more efficiently as the Index file is much smaller than the data file. Database Management Systems Unit 4 Sikkim Manipal University Page No.: 62 Figure 4.1: Primary Index on the ordering key field of the file Self Assessment Question(s) (SAQs) (for section 4.2) 1. What do you mean by primaryindex? Explain. 4.3 Clustering Index A Clustering Index is an index specified by ordering non-key field of a data-. That field is called the Clustering Field. Database Management Systems Unit 4 Sikkim Manipal University Page No.: 63 Unlike a primary index, this may have duplicate values. Figure 4.2: Clustering index on the DEPTNUMBER ordering nonkey fi eld of an EMPLOYEE file Self Assessment Question(s) (SAQs) (for section 4.3) 1. Define the concept of clustering index. 4.4 Secondary Index Secondary index is an ordered file with two fields. The first field is of the same data type as some non-ordering field of the data file. i.e., indexing field. The second field is either a block pointer or a record pointer. In this case there is one index entry for each record in Database Management Systems Unit 4 Sikkim Manipal University Page No.: 64 the data file, which contains the value of the secondary key or record; pointer points to the block in which the record is stored or to the record itself. Hence it is an example for 'dense index'. The records of the data file are not physically ordered by values so we can't use block anchors, thats why an index usually needs more storage space or longer search time as compared to primary index. Figure 4.3: A dense secondary index on a non-ordering key field of a file Self Assessment Question(s) (SAQs) (for section 4.4) 1. Explain the concept of secondary index. 4.5 Multilevel indexes Indexes with two or more levels are called multilevel indexes. The idea behind a multilevel index is to reduce search time required for searching the whole data file. A multilevel index considers the index file which will be referred to as the first level of a multilevel index, as an ordered file with a distinct value for each k(i). Hence we can create a primary index for the first level, this index to the first level is called the second level of multilevel index. Because the second level is a primary index, we can use block anchors so that the second level has one entry for each block of the first level. Database Management Systems Unit 4 Sikkim Manipal University Page No.: 65 Figure 4.4: A two-l evel primary index resembling ISAM (Indexed Sequential Access Method) organization Self Assessment Question(s) (SAQs) (for section 4.5) 1. Write a note on multilevel indices. 4.6 B+Tree Index Files The main disadvantage of the index-sequential file organization is that performance degrades as the file grows. A B +- tree index takes the form of a balanced tree in which every path from the root of the tree to a leaf of the tree is of the same length. Database Management Systems Unit 4 Sikkim Manipal University Page No.: 66 In a B- tree every value of the search field appears once at some level in the tree, along with a data pointer [may be in internal nodes also]. In a B+-tree, data pointers [address of a particular search value] are stored only at the leaf nodes of the tree; hence, the structure of leaf nodes differs from the structure of internal nodes. The leaf nodes have an entry for every value of the search field, along with a data pointer to the record. A B+tree is a multilevel index, but it has got different a structure. A typical node of the B+ tree contains up to n-1 search key values such as k1,k2.n-1 and n pointers p1,p2..pn. The search key values within a node are kept in sorted order, ki <kj. The number of pointers in a node is called the fan out of the node. The structure of a non-leaf node is the same as leaf nodes, except that all pointers are pointers to tree nodes. Each internal node is of the form >p1, k1,p2,k2.pq-1, kq-1, pq> The root node has at least 2 tree pointers. Each leaf node is of the form <<k1, pr1>,<k2, pr2><kn-1, prn-1>, pnext> each pri is a data pointer, and pnext points to the next leaf node of the B+tree All leaf nodes are at the same level. Consider an example, assume that we wish to insert a record in a B+tree of order n=3 and pleaf=2, first we observe that root is the only node in the tree, so it is also a leaf node. As soon as more than one level is created, the tree is divided into internal nodes and leaf nodes. Notice that every value must exist at the leaf level, because all the data pointers are at the leaf level. However, only some values exist in internal nodes to guide the search. Notice also that every value appearing in an internal node also appears in the subtree as the rightmost value. Database Management Systems Unit 4 Sikkim Manipal University Page No.: 67 Say for example, to insert 12, the node is split into two nodes. The figure shows the two leaf nodes that result from inserting 12. An existing node contains 7 and 8 and remaining value 12 in a new node. The first J =[((P leaf +1)1/2)] =3/2 =2 entries in the original node are kept there and the remaining entries are moved to a new leaf node. The J th search value is replicated in the parent internal node, and an extra pointer to the new node is created in the parent. If the parent internal node is full, it must be split. This splitting can propagate all the way up to create a new root node. Database Management Systems Unit 4 Sikkim Manipal University Page No.: 68 Figure 4.5: An example of insertion in a B + tree with p=3 and Pleaf=2 Database Management Systems Unit 4 Sikkim Manipal University Page No.: 69 Self Assessment Question(s) (SAQs) (for section 4.6) 1. Explain the concept of B+Tree 4.7 B-tree B tree indexes and similar to B+tree indexes. The main difference is that a B tree eliminates the duplicate storage of search key values. In the B+ tree every search key value appears in some leaf nodes and several are repeated in non-leaf nodes. AB tree allows search key values to appear only once, hence B tree requires fewer nodes. However, since the search-keys appear only once, in a B tree, every node contains search value along with an address of that value [pointer point either to file records or to buckets that contain the search value]. Figure 4.6: B- tree structures. (a) A node in a B- tree with q-1 search values (b) A B- tree of order P=3. The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6. Consider figure B, in the values 5, 8, 1, 3, 6, 7, 9, 12 are values of the indexed filed. Consider the top node, which consists of two value entries [5 Database Management Systems Unit 4 Sikkim Manipal University Page No.: 70 and 8] and three pointers. Values less than 5 or equal to 5 are placed in the left lower node, similarly values greater than 5 and less than 8 are placed in the middle node, and greater than 8 are placed in the right lower node. Consider an algorithm of B-trees. Set N to the top node Let X, Y be the data values in node N [x<y] If V<=X Then set N to the left lower node of N If X, V <=Y Then set N to middle lower node of N If V >Y Then set N to right lower node of N End If V occurs in node N then exit [found] If V does not occur in node N then exit [not found] A B tree starts with a single root node [which is also a leaf node] at level 0 [zero]. Once the root node is full with p 1 search key values, we attempt to insert another entry in the tree, the root node splits into two nodes at level 1. Only the middle value is kept in the root node, and the rest of the values are split evenly between the other two nodes. When a non root node is full and a new entry is inserted into it, that node is split into two nodes at the same level, and the middle entry is moved to the parent node along with two pointers to the split nodes. If the parent node is full, it is also split. Splitting can propagate all the way to the root node. Advantage: B tree eliminates the redundant storage of search key values. Database Management Systems Unit 4 Sikkim Manipal University Page No.: 71 Disadvantage: Deletion in a B tree is more complicated. In a B+tree, the deleted entry always appears in a leaf. In a B tree, the deleted entry may appear in a non-leaf node. Self Assessment Question(s) (SAQs) (For Section 4.7) 1. Define B-Tree. 4.8 Summary In this unit we have learnt the following concepts: Primary index Clustering index Secondary index Multilevel index B- trees and b+trees 4.9 Terminal Questions (TQs) 1. List different types of indexes 2. Explain briefly primary index, clustering index and secondary index. 3. How does multilevel indexing improve the efficiency of searching an index file? 4. What is a B+tree? Describe the structure of both internal and leaf nodes of a B+tree? 4.10 Multiple Choice Questions (MCQs) 1. Indexes are used to the retrieval of records. (a) speed up (b) read (c) slowdown (d) None of the above Database Management Systems Unit 4 Sikkim Manipal University Page No.: 72 2. is an index specified by ordering the key field. (a) Clustering index (b) primary index (c) Secondary index (d) None of the above 3. is an ordered file with two fields (a) Secondary index (a) Primary index (b) Clustering index (c) None of the above 4. In. data pointers are stored only at the leaf nodes of the tree; the space overhead of the directory table is negligible. (a) Static Hashing (b) B -tree (c) B+-tree (d) None of the above 4.11 Answers to SAQs, TQs, and MCQs 4.11.1 Answers to Self Assessment Questions (SAQs) For Section 4.1 1. Indexes are used to speed up the retrieval of records. (Refer section 4.1) For Section 4.2 1. Primary index: A primary index is an index specified by ordering key field. (Refer section 4.2) For Section 4.3 1. Clustering Index: Database Management Systems Unit 4 Sikkim Manipal University Page No.: 73 A Clustering Index is an index specified by ordering non-key field of a data. (Refer section 4.3) For Section 4.4 1. Secondary index is an ordered file with two fields. (Refer section 4.4) For Section 4.5 1. Multilevel indexes: Indexes with two or more levels are called multilevel indexes. (Refer section 4.5) For Section 4.6 1. A B +- tree index takes the form of a balanced tree in which every path from the root of the tree to a leaf of the tree is of the same length. (Refer section 4.6) For Section 4.7 1. B tree indexes and similar to B+tree indexes. The main difference is that a B tree eliminates the duplicate storage of search key values. (Refer section 4.7) 4.11.2 Answers to Terminal Questions (TQs) 1. There are several types of indexes: Primary index Clustering index Secondary index Multilevel index B- Trees and b+trees. (Refer section 4.1) 2. A primary index is an index specified by ordering key field. (Refer sections 4.2, 4.3, and 4.4) 3. Indexes with two or more levels are called multilevel indexes. (Refer section 4.5) Database Management Systems Unit 4 Sikkim Manipal University Page No.: 74 4. A B +- tree index takes the form of a balanced tree in which every path from the root o the tree to a leaf of the tree is of the same length. (Refer section 4.6) 4.11.3 Answers to Multiple Choice Questions (MCQs) 1. A 2. B 3. A 4. C