Sei sulla pagina 1di 7

Solution for assignment 4

Chapter 5:

5.21 Discuss the techniques for allowing a hash file to expand and shrink dynamically. What are the
advantages and disadvantages of each?

See pages 144 to 147 in the textbook.

Static hashing uses a fixed address space.

Extendible hashing:

A type of directory

Linear hashing:

allows a hash file to expand and shrink its number of buckets dynamically without a directory file.

5.22 What are mixed files used for? What are other types of primary file organizations?

A mixed file is a file which contains records of different record types. This would be used if related records
of different types were clustered (placed together) on disk blocks. For example, the GRADE_REPORT
records of a particular student may be placed following that STUDENT’s record.

Three primary file organizations: unordered, ordered, hashed. Or pile, sorted, hashed, indexed (B-tree)

5.27 A parts file with Part# as hash key includes records with the following Part# values: 2369, 3760, 4692,
4871, 5659, 1821, 1074, 7115, 1620, 2426, 3943, 4750, 6975, 4981, 9208. The file uses eight buckets,
numbered 0 to 7. Each bucket is one disk block and holds two records. Load these records into the file in
the given order, using the hash function h(K) = K mod 8. Calculate the average number of block accesses
for a random retrieval on Part#.

If don’t have mod function on calculator:

H(k) = 2369 – (floor(2369/8)*8


2369/8 = 296.125 floor(296.125) = 296

h(K) = 2369 – (296*8) = 1

K h(K) (bucket number)

2369 1

3760 0

4692 4

4871 7

5659 3

1821 5
1074 2

7115 3

1620 4

2428 4 Overflow

3943 7

4750 6

6975 7 Overflow

4981 5

9208 0

9209 Not in file

Bin Number Key 1 Key 2 Overflow

0 3760 9208

1 2369

2 1074
3 5659 7115

4 4692 1620 Yes (2428)

5 1821 4981

6 4750

7 4781 3943 Yes (6975)

Two records out of 15 are in overflow, which will require an additional block access. The other records
require only one block access.

Average time is: (1 (13/15)) + (2(2/15)) = 0.867 + 0.266 = 1.133 block accesses.

5.34 Suppose that we have a hash file of fixed-length records, and suppose that over-flow is handled by
chaining. Outline algorithms for insertion, deletion, and modification of a file record. State any
assumptions you make.

First must understand what chaining is.


Assumptions:

Each key is unique.

Data file is open.

Overflow file is open.

A bucket record has been defined.

Insert:

Calculate bucket using the KAT t

Read bucket

6.1 Define the following terms:

Indexing field: (page 155 in the textbook) record fields that are used to construct an index. Any field in a
file can be used to create an index and multiple indexes on different fields can be constructed on a file.

Primary key field: (page 157 in the textbook) the ordering key field of the file. A field that uniquely
identifies a record.

Clustering field: (page 159 in the textbook) If the records of a file are physically ordered on a non-key field
– which does not have a distinct value for each record – that field is called the clustering field.

Secondary key field: (page 162 in the textbook) A secondary index is also an ordered file with two fields
(like a primary index). However the first field is of the same data type as some non-ordering field of the
data file that is an indexing field. If the secondary access structure uses a key field that has a distinct value
for every record it is called a secondary key field.

Block anchor: (page 157 in the textbook) The first record in each block of the data file.

Dense index: (page 157 in the textbook) An index that has an index entry for every search key value (and
hence every record) in the data file.

Non-dense (sparse) index page 157 in the textbook) An index that has entries for only some of the search
values.

6.5 What is the order p of a B-tree? Describe the structure of B-tree nodes.
Refer to page 170 in the text.

A search tree of order p is a tree such that each node contains at most p – 1 search values and p pointers in
the order <P1, K1, P2, K2, … Pq-1, Pq> where q <= p; each Pi value

All search values are assumed unique.

6.6 What is the order p of a B+-tree? Describe the structure of both internal and leaf nodes of a B+-tree.

Refer to page 175 in the textbook.

Each internal node is of the form <P1, K1, P2, K2, …, Pq-1, Kq-1, Pq>

Each internal node has at most p tree pointers.

Each internal node, except the root, has at least ceil(p/2) tree pointers. The root node has at least two tree
pointers if it is an internal node.

An internal node with q pointers, q<= P, has q – 1 search values.

Each leaf node is of the form << K1, Pr1>, <K2, Pr2>, …, <Kq-1,Prq-1>,Pnext>

Where q <= p, each Pri is a data pointer, and Pnext points to the next leaf node.

Each Pri is a data pointer.

Each leaf node has at least ceil(p/2) values.

All leaf nodes are at the same level.

6.7 How does a B-tree differ from a B+-tree? Why is a B+-tree usually preferred as an access structure to a
data file?

A B-tree has data pointers in both internal and leaf nodes. A B-tree has only tree pointers in internal nodes
and all data pointers are in leaf nodes.

Because entries in the internal nodes of a B+-tree contain only tree pointers and not data pointers more
entries can be packed into an internal node of a B+-tree leading to fewer levels improving search time. In
addition, the entire tree can be traversed in order using the Pnext pointers.

6.14 Consider a disk with block size=512 bytes. A block pointer is P= 6 bytes long, and a record pointer is
R = 7 bytes long. A file has r=30,000 EMPLOYEE records of fixed-length. Each record has the following
fields: NAME (30 bytes), SSN (9 bytes), DEPARTMENTCODE (9 bytes), ADDRESS (40 bytes), PHONE
(9 bytes), BIRTHDATE (8 bytes), SEX (1 byte), JOBCODE (4 bytes), SALARY (4 bytes, real number).
An additional byte is used as a deletion marker.

Calculate the record size R in bytes:

R = (30 +9 + 9 + 40 + 9 + 8 + 1 + 4 + 4) + 1 = 115 bytes.


Calculate the blocking factor bfr and the number of file blocks b assuming an unspanned organization.

Bfr = floor(B/R) = floor(512/15) = 4 records per block

Number of blocks needed for file b = ceiling(30000/4) = 7500

Suppose the file is ordered by key field SSN and we want to construct a primary index on SSN (talk about
primary key and unique) Calculate:

i. The index blocking factor bfr I (which is also the fan out fo)

The index record size Ri = (VSSN + P) = (9 + 6) = 15 bytes

Indexs blocking factor bfri = f0 = floor(B/Ri) = floor(512/15) = 34

The number of first-level entries r1 = number of file blocks b = 7500 entries

Number of first-level indewx blocks b1 = ceil(r1/bfri) = ceil(7500/34) = 221 blocks

Calculate the number of levels as follows:

Number of second-level index entries r2 = number of first-level blocks b1 = 221 entries

Number of second level index blocks b2 = ceil(r2/bfri) = ceil(221/34) = 7 blocks

Number of third-level index entries r3 = number of second-level index blocks b2 = 7 entries

Number of third-level index blocks = ceil(r3/bfri)=ceil(7/34)=1

Since the third level has only 1 block it is the top index level. Hence, the index has x = 3 levels

The total number of blocks for the index bi = b1 + b2 + b3 = 221 + 7 + 1 = 229 blocks

Number of block accesses to search for a record = x + 1 = 3 + 1 = 4

6.15 See solution handed out in class.

Potrebbero piacerti anche