Unit 4 Index Structures For Files: Structure

Database Management Systems Unit 4
Sikkim Manipal University Page No.: 59

Unit 4 Index Structures for Files
Structure
4.1 Introduction
Objectives
Self Assessment Question(s) (SAQs)
4.2 Primary index
4.3 Clustering Index
4.4 Secondary index
4.5 Multilevel indexes
4.6 B+Tree Index Files
4.7 B-tree
4.8 Summary
4.9 Terminal Questions (TQs)
4.10 Multiple Choice Questions (MCQs)
4.11 Answers to SAQs, TQs, and MCQs
4.11.1 Answers to Self Assessment Questions (SAQs)
4.11.2 Answers to Terminal Questions (TQs)
4.11.3 Answers to Multiple Choice Questions (MCQs)
4.1 Introduction
Indexes are used to speed up the retrieval of records.
A database index is a data structure that improves the speed of
operations in a table.
Indexes can be created using one or more columns, providing the basis
for both rapid random lookups and efficient ordering of access to
records.
The disk space required to store the index is typically less than the
storage of the table (since indexes usually contain only the key-fields
according to which the table is to be arranged, and exclude all the other
details in the table).
Index file consists of two fields, the first field contains the value and
second field contains the list of pointers to address values in the disk
block
With the help of the index, it can search the index to find the requested
value and with the help of pointer it can easily locate the requested row
of the table.
Searching an index is much faster than searching the table because the
index is sorted and its rows are very small.
Index access structure is usually defined on a single field of a file, called
an indexing field.
There are several types of indexes:
1. Primary index
2. Clustering index
3. Secondary index
4. Multilevel index
5. B - Trees and b +trees.
Objectives
To know about
Primary index
Clustering index
Secondary index
Multilevel index
B - trees and b+trees
Self Assessment Question(s) (SAQs) (for section 4.1)
Why do you need index? List different types of indices.
4.2 Primary index
A primary index is an index specified by ordering key field.
It consists of two fields
1. First Field: This is of the same Data type as the primary key field of
the data file
2. Second Field: Pointer to a disk block.
For each block in the data file there is a corresponding index entry in the
index file
Each index entry has the value of the primary key field for the first record
in the block and pointer to that block.
Performing the search operation on the index can be done more
efficiently as the Index file is much smaller than the data file.
Figure 4.1: Primary Index on the ordering key field of the file
1. What do you mean by primaryindex? Explain.
4.3 Clustering Index
A Clustering Index is an index specified by ordering non-key field of a
data-.
That field is called the Clustering Field.
Unlike a primary index, this may have duplicate values.
Figure 4.2: Clustering index on the DEPTNUMBER ordering nonkey fi eld of an
EMPLOYEE file
1. Define the concept of clustering index.
4.4 Secondary Index
Secondary index is an ordered file with two fields.
The first field is of the same data type as some non-ordering field of the
data file. i.e., indexing field. The second field is either a block pointer or
a record pointer. In this case there is one index entry for each record in
the data file, which contains the value of the secondary key or record;
pointer points to the block in which the record is stored or to the record
itself. Hence it is an example for 'dense index'. The records of the data
file are not physically ordered by values so we can't use block anchors,
thats why an index usually needs more storage space or longer search
time as compared to primary index.
Figure 4.3: A dense secondary index on a non-ordering key field of a file
1. Explain the concept of secondary index.
4.5 Multilevel indexes
Indexes with two or more levels are called multilevel indexes. The idea
behind a multilevel index is to reduce search time required for searching the
whole data file. A multilevel index considers the index file which will be
referred to as the first level of a multilevel index, as an ordered file with a
distinct value for each k(i). Hence we can create a primary index for the first
level, this index to the first level is called the second level of multilevel index.
Because the second level is a primary index, we can use block anchors so
that the second level has one entry for each block of the first level.
Figure 4.4: A two-l evel primary index resembling ISAM (Indexed Sequential
Access Method) organization
1. Write a note on multilevel indices.
4.6 B+Tree Index Files
The main disadvantage of the index-sequential file organization is that
performance degrades as the file grows. A B
+-
tree index takes the form of a
balanced tree in which every path from the root of the tree to a leaf of the
tree is of the same length.
In a B- tree every value of the search field appears once at some level in the
tree, along with a data pointer [may be in internal nodes also]. In a B+-tree,
data pointers [address of a particular search value] are stored only at the
leaf nodes of the tree; hence, the structure of leaf nodes differs from the
structure of internal nodes. The leaf nodes have an entry for every value of
the search field, along with a data pointer to the record.
A B+tree is a multilevel index, but it has got different a structure. A typical
node of the B+ tree contains up to n-1 search key values such as
k1,k2.n-1 and n pointers p1,p2..pn. The search key values within a
node are kept in sorted order, ki <kj.
The number of pointers in a node is called the fan out of the node.
The structure of a non-leaf node is the same as leaf nodes, except that
all pointers are pointers to tree nodes.
Each internal node is of the form >p1, k1,p2,k2.pq-1, kq-1, pq>
The root node has at least 2 tree pointers.
Each leaf node is of the form
<<k1, pr1>,<k2, pr2><kn-1, prn-1>, pnext>
each pri is a data pointer, and pnext points to the next leaf node of the
B+tree
All leaf nodes are at the same level.
Consider an example, assume that we wish to insert a record in a B+tree of
order n=3 and pleaf=2, first we observe that root is the only node in the tree,
so it is also a leaf node. As soon as more than one level is created, the tree
is divided into internal nodes and leaf nodes. Notice that every value must
exist at the leaf level, because all the data pointers are at the leaf level.
However, only some values exist in internal nodes to guide the search.
Notice also that every value appearing in an internal node also appears in
the subtree as the rightmost value.
Say for example, to insert 12, the node is split into two nodes.
The figure shows the two leaf nodes that result from inserting 12. An
existing node contains 7 and 8 and remaining value 12 in a new node. The
first J =[((P
leaf
+1)1/2)] =3/2 =2 entries in the original node are kept there
and the remaining entries are moved to a new leaf node. The J
th
search
value is replicated in the parent internal node, and an extra pointer to the
new node is created in the parent. If the parent internal node is full, it must
be split. This splitting can propagate all the way up to create a new root
node.
Figure 4.5: An example of insertion in a B
+
tree with p=3 and Pleaf=2
1. Explain the concept of B+Tree
4.7 B-tree
B tree indexes and similar to B+tree indexes. The main difference is that
a B tree eliminates the duplicate storage of search key values. In the B+
tree every search key value appears in some leaf nodes and several are
repeated in non-leaf nodes. AB tree allows search key values to appear
only once, hence B tree requires fewer nodes. However, since the
search-keys appear only once, in a B tree, every node contains search
value along with an address of that value [pointer point either to file records
or to buckets that contain the search value].
Figure 4.6: B- tree structures. (a) A node in a B- tree with q-1 search values
(b) A B- tree of order P=3. The values were inserted in the order 8, 5, 1, 7, 3,
12, 9, 6.
Consider figure B, in the values 5, 8, 1, 3, 6, 7, 9, 12 are values of the
indexed filed. Consider the top node, which consists of two value entries [5
and 8] and three pointers. Values less than 5 or equal to 5 are placed in the
left lower node, similarly values greater than 5 and less than 8 are placed in
the middle node, and greater than 8 are placed in the right lower node.
Consider an algorithm of B-trees.
Set N to the top node
Let X, Y be the data values in node N [x<y]
If V<=X
Then set N to the left lower node of N
If X, V <=Y
Then set N to middle lower node of N
If V >Y
Then set N to right lower node of N
End
If V occurs in node N then exit [found]
If V does not occur in node N then exit [not found]
A B tree starts with a single root node [which is also a leaf node] at level 0
[zero]. Once the root node is full with p 1 search key values, we attempt
to insert another entry in the tree, the root node splits into two nodes at level
1. Only the middle value is kept in the root node, and the rest of the values
are split evenly between the other two nodes. When a non root node is full
and a new entry is inserted into it, that node is split into two nodes at the
same level, and the middle entry is moved to the parent node along with two
pointers to the split nodes. If the parent node is full, it is also split. Splitting
can propagate all the way to the root node.
Advantage:
B tree eliminates the redundant storage of search key values.
Disadvantage:
Deletion in a B tree is more complicated. In a B+tree, the deleted entry
always appears in a leaf. In a B tree, the deleted entry may appear in a
non-leaf node.
Self Assessment Question(s) (SAQs) (For Section 4.7)
1. Define B-Tree.
4.8 Summary
In this unit we have learnt the following concepts:
Primary index
Clustering index
Secondary index
Multilevel index
B- trees and b+trees
4.9 Terminal Questions (TQs)
1. List different types of indexes
2. Explain briefly primary index, clustering index and secondary index.
3. How does multilevel indexing improve the efficiency of searching an
index file?
4. What is a B+tree? Describe the structure of both internal and leaf nodes
of a B+tree?
4.10 Multiple Choice Questions (MCQs)
1. Indexes are used to the retrieval of records.
(a) speed up
(b) read
(c) slowdown
(d) None of the above
2. is an index specified by ordering the key field.
(a) Clustering index
(b) primary index
(c) Secondary index
3. is an ordered file with two fields
(a) Secondary index
(a) Primary index
(b) Clustering index
(c) None of the above
4. In. data pointers are stored only at the leaf nodes of the tree; the
space overhead of the directory table is negligible.
(a) Static Hashing
(b) B -tree
(c) B+-tree
4.11 Answers to SAQs, TQs, and MCQs
4.11.1 Answers to Self Assessment Questions (SAQs)
For Section 4.1
1. Indexes are used to speed up the retrieval of records. (Refer section
4.1)
For Section 4.2
1. Primary index:
A primary index is an index specified by ordering key field. (Refer
section 4.2)
For Section 4.3
1. Clustering Index:
A Clustering Index is an index specified by ordering non-key field of a
data. (Refer section 4.3)
For Section 4.4
1. Secondary index is an ordered file with two fields. (Refer section 4.4)
For Section 4.5
1. Multilevel indexes: Indexes with two or more levels are called multilevel
indexes. (Refer section 4.5)
For Section 4.6
1. A B
+-
tree index takes the form of a balanced tree in which every path
from the root of the tree to a leaf of the tree is of the same length. (Refer
section 4.6)
For Section 4.7
1. B tree indexes and similar to B+tree indexes. The main difference is
that a B tree eliminates the duplicate storage of search key values.
(Refer section 4.7)
4.11.2 Answers to Terminal Questions (TQs)
1. There are several types of indexes:
Primary index
Clustering index
Secondary index
Multilevel index
B- Trees and b+trees. (Refer section 4.1)
2. A primary index is an index specified by ordering key field.
(Refer sections 4.2, 4.3, and 4.4)
3. Indexes with two or more levels are called multilevel indexes.
(Refer section 4.5)
4. A B
+-
tree index takes the form of a balanced tree in which every path
from the root o the tree to a leaf of the tree is of the same length.
(Refer section 4.6)
4.11.3 Answers to Multiple Choice Questions (MCQs)
1. A
2. B
3. A
4. C

Unit 4 Index Structures For Files: Structure

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Unit 4 Index Structures For Files: Structure

Caricato da

Copyright:

Formati disponibili

Database Management Systems Unit 4

Sikkim Manipal University Page No.: 59

Potrebbero piacerti anche