Sei sulla pagina 1di 226

Chapter 11: Storage and File Structure

Rev. Aug 1, 2008


Database System Concepts, 5th Ed.
Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Chapter 11: Storage and File Structure


Overview of Physical Storage Media
Magnetic Disks
RAID
Tertiary Storage
Storage Access
File Organization
Organization of Records in Files
Data-Dictionary Storage

Database System Concepts - 5th Edition

11.2

Silberschatz, Korth and Sudarshan

Classification of Physical Storage Media


Speed with which data can be accessed
Cost per unit of data
Reliability

data loss on power failure or system crash

physical failure of the storage device

Can differentiate storage into:

volatile storage: loses contents when power is switched


off

non-volatile storage:

Contents persist even when power is switched off.

Includes secondary and tertiary storage, as well as


battery-backed up main-memory.

Database System Concepts - 5th Edition

11.3

Silberschatz, Korth and Sudarshan

Physical Storage Media


Cache fastest and most costly form of storage; volatile;

managed by the computer system hardware


(Note: Cache is pronounced as cash)
Main memory:
fast access (10s to 100s of nanoseconds; 1 nanosecond =
109 seconds)
generally too small (or too expensive) to store the entire
database
capacities of up to a few Gigabytes widely used currently
Capacities have gone up and per-byte costs have
decreased steadily and rapidly (roughly factor of 2 every
2 to 3 years)
Volatile contents of main memory are usually lost if a
power failure or system crash occurs.

Database System Concepts - 5th Edition

11.4

Silberschatz, Korth and Sudarshan

Physical Storage Media (Cont.)


Magnetic-disk

Data is stored on spinning disk, and read/written magnetically

Primary medium for the long-term storage of data; typically stores


entire database.

Data must be moved from disk to main memory for access, and
written back for storage

direct-access possible to read data on disk in any order, unlike


magnetic tape

Survives power failures and system crashes


disk

failure can destroy data: is rare but does happen

Database System Concepts - 5th Edition

11.5

Silberschatz, Korth and Sudarshan

Storage Hierarchy

Database System Concepts - 5th Edition

11.6

Silberschatz, Korth and Sudarshan

Storage Hierarchy (Cont.)


primary storage: Fastest media but volatile (cache, main

memory).

secondary storage: next level in hierarchy, non-volatile,

moderately fast access time

also called on-line storage

E.g. flash memory, magnetic disks

tertiary storage: lowest level in hierarchy, non-volatile, slow

access time

also called off-line storage

E.g. magnetic tape, optical storage

Database System Concepts - 5th Edition

11.7

Silberschatz, Korth and Sudarshan

Magnetic Hard Disk Mechanism

NOTE: Diagram is schematic, and simplifies the structure of actual disk drives
Database System Concepts - 5th Edition

11.8

Silberschatz, Korth and Sudarshan

Magnetic Disks
Read-write head

Positioned very close to the platter surface (almost touching it)

Reads or writes magnetically encoded information.

Surface of platter divided into circular tracks

Over 50K-100K tracks per platter on typical hard disks

Each track is divided into sectors.

Sector size typically 512 bytes

Typical sectors per track: 500 (on inner tracks) to 1000 (on outer
tracks)

To read/write a sector

disk arm swings to position head on right track

platter spins continually; data is read/written as sector passes


under head

Database System Concepts - 5th Edition

11.9

Silberschatz, Korth and Sudarshan

Storage Access
A database file is partitioned into fixed-length storage units

called blocks. Blocks are units of both storage allocation and


data transfer.

Database system seeks to minimize the number of block

transfers between the disk and memory. We can reduce the


number of disk accesses by keeping as many blocks as
possible in main memory.

Buffer portion of main memory available to store copies of

disk blocks.

Buffer manager subsystem responsible for allocating buffer

space in main memory.

Database System Concepts - 5th Edition

11.10

Silberschatz, Korth and Sudarshan

Buffer Manager
Programs call on the buffer manager when they need a block from

disk.

Buffer manager does the following:

If the block is already in the buffer, return the address of the


block in main memory

1.

If the block is not in the buffer


1.

Allocate space in the buffer for the block


1. Replacing (throwing out) some other block, if required,
to make space for the new block.
2. Replaced block written back to disk only if it was
modified since the most recent time that it was written
to/fetched from the disk.

2.

Read the block from the disk to the buffer, and return the
address of the block in main memory to requester.

Database System Concepts - 5th Edition

11.11

Silberschatz, Korth and Sudarshan

Buffer-Replacement Policies
Most operating systems replace the block least recently used (LRU

strategy)

Idea behind LRU use past pattern of block references as a

predictor of future references

Queries have well-defined access patterns (such as sequential

scans), and a database system can use the information in a users


query to predict future references

LRU can be a bad strategy for certain access patterns involving


repeated scans of data

e.g. when computing the join of 2 relations r and s by a nested loops


for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match

Mixed strategy with hints on replacement strategy provided


by the query optimizer is preferable

Database System Concepts - 5th Edition

11.12

Silberschatz, Korth and Sudarshan

Buffer-Replacement Policies (Cont.)


Pinned block memory block that is not allowed to be

written back to disk.

Toss-immediate strategy frees the space occupied by a

block as soon as the final tuple of that block has been


processed

Most recently used (MRU) strategy system must pin the

block currently being processed. After the final tuple of that


block has been processed, the block is unpinned, and it
becomes the most recently used block.

Buffer manager can use statistical information regarding the

probability that a request will reference a particular relation

E.g., the data dictionary is frequently accessed.


Heuristic: keep data-dictionary blocks in main memory
buffer

Buffer managers also support forced output of blocks for the

purpose of recovery (more in Chapter 17)

Database System Concepts - 5th Edition

11.13

Silberschatz, Korth and Sudarshan

File Organization
The database is stored as a collection of files. Each file is a

sequence of records. A record is a sequence of fields.

One approach:
assume
each

record size is fixed

file has records of one particular type only

different

files are used for different relations

This case is easiest to implement; will consider variable length


records later.

Database System Concepts - 5th Edition

11.14

Silberschatz, Korth and Sudarshan

Records
Fixed and variable length records
Records contain fields which have values of a particular type (e.g.,

amount, date, time, age)

Fields themselves may be fixed length or variable length


Variable length fields can be mixed into one record: separator

characters or length fields are needed so that the record can be


parsed.

Slide 9 -15
Database System Concepts - 5th Edition

11.15

Silberschatz, Korth and Sudarshan

Blocking
Blocking: refers to storing a number of records in one blo ck on the

disk.

Blocking factor (bfr) refers to the number of records per block.


There may be empty space in a block if an integral number of records

do not fit in one block.

Spanned Records: refer to records that exceed the size of one or more

blocks and hence span a number of blocks.

Slide 9 -16
Database System Concepts - 5th Edition

11.16

Silberschatz, Korth and Sudarshan

Files of Records

A file is a sequence of records, where each record is a collection of data


values (or data items).

A file descriptor (or file header ) includes information that describes the file,
such as the field names and their data types, and the addresses of the file
blocks on disk.

Records are stored on disk blocks. The blocking factor bfr for a file is the
(average) number of file records stored in a disk block.

A file can have fixed-length records or variable-length records.

Slide 9 -17
Database System Concepts - 5th Edition

11.17

Silberschatz, Korth and Sudarshan

Files of Records (cont.)

File records can be unspanned (no record can span two blocks) or spanned (a
record can be stored in more than one block).

The physical disk blocks that are allocated to hold the records of a file can be
contiguous, linked, or indexed.

In a file of fixed-length records, all records have the same format. Usually,
unspanned blocking is used with such files.

Files of variable-length records require additional information to be stored in


each record, such as separator characters and field types. Usually spanned
blocking is used with such files.

Slide 9 -18
Database System Concepts - 5th Edition

11.18

Silberschatz, Korth and Sudarshan

Record Representation
Records with fixed length fields are easy to represent

Similar to records (structs) in programming languages

Extensions to represent null values


E.g.

a bitmap indicating which attributes are null

Variable length fields can be represented by a pair

(offset, length)
offset: the location within the record,

length: field length.

All fields start at predefined location, but extra indirection


required for variable length fields

A-102

10

400 000

Perryridge

balance
account_number
branch_name null-bitmap
Example record structure of account record
Database System Concepts - 5th Edition

11.19

Silberschatz, Korth and Sudarshan

Fixed-Length Records
Simple approach:

Store record i starting from byte n (i 1), where n is the


size of each record.

Record access is simple but records may cross blocks


Modification:

boundaries

do not allow records to cross block

Deletion of record i:

alternatives:

move records i + 1, . . ., n
to i, . . . , n 1

move record n to i

do not move records, but


link all free records on a
free list

Database System Concepts - 5th Edition

11.20

Silberschatz, Korth and Sudarshan

Free Lists
Store the address of the first deleted record in the file header.
Use this first record to store the address of the second deleted

record, and so on

Can think of these stored addresses as pointers since they

point to the location of a record.

More space efficient representation: reuse space for normal

attributes of free records to store pointers. (No pointers stored


in in-use records.)

Database System Concepts - 5th Edition

11.21

Silberschatz, Korth and Sudarshan

Variable-Length Records
Variable-length records arise in database systems in

several ways:

Storage of multiple record types in a file.

Record types that allow variable lengths for one or more


fields.

Record types that allow repeating fields (used in some


older data models).

Database System Concepts - 5th Edition

11.22

Silberschatz, Korth and Sudarshan

Variable-Length Records: Slotted Page


Structure

Slotted page header contains:

number of record entries

end of free space in the block

location and size of each record

Records can be moved around within a page to keep them

contiguous with no empty space between them; entry in the


header must be updated.

Pointers should not point directly to record instead they

should point to the entry for the record in header.

Database System Concepts - 5th Edition

11.23

Silberschatz, Korth and Sudarshan

Organization of Records in Files


Heap a record can be placed anywhere in the file where

there is space

Sequential store records in sequential order, based on the

value of the search key of each record

Hashing a hash function computed on some attribute of each

record; the result specifies in which block of the file the record
should be placed

Records of each relation may be stored in a separate file. In a

multitable clustering file organization records of several


different relations can be stored in the same file

Motivation: store related records on the same block to


minimize I/O

Database System Concepts - 5th Edition

11.24

Silberschatz, Korth and Sudarshan

Sequential File Organization


Suitable for applications that require sequential

processing of the entire file

The records in the file are ordered by a search-key

Database System Concepts - 5th Edition

11.25

Silberschatz, Korth and Sudarshan

Sequential File Organization (Cont.)


Deletion use pointer chains
Insertion locate the position where the record is to be inserted

if there is free space insert there

if no free space, insert the record in an overflow block

In either case, pointer chain must be updated

Need to reorganize the file

from time to time to restore


sequential order

Database System Concepts - 5th Edition

11.26

Silberschatz, Korth and Sudarshan

Multitable Clustering File Organization (cont.)


Store several relations in one file using a multitable

clustering file organization

Multitable clustering organization of customer and depositor:

good for queries involving depositor customer, and for


queries involving one single customer and his accounts
bad for queries involving only customer

Database System Concepts - 5th Edition

11.27

Silberschatz, Korth and Sudarshan

Data Dictionary Storage


Data dictionary (also called system catalog) stores
metadata: that is, data about data, such as
Information about relations

names of relations
names and types of attributes of each relation
names and definitions of views
integrity constraints
User and accounting information, including passwords
Statistical and descriptive data
number of tuples in each relation
Physical file organization information
How relation is stored (sequential/hash/)
Physical location of relation
Information about indices (Chapter 12)

Database System Concepts - 5th Edition

11.28

Silberschatz, Korth and Sudarshan

Data Dictionary Storage (Cont.)


Catalog structure

Relational representation on disk

specialized data structures designed for efficient access, in


memory

A possible catalog representation:

Relation_metadata = (relation_name, number_of_attributes,


storage_organization, location)
Attribute_metadata = (relation_name, attribute_name, domain_type,
User_metadata =
Index_metadata =
View_metadata =

Database System Concepts - 5th Edition

position, length)
(user_name, encrypted_password, group)
(relation_name, index_name, index_type,
index_attributes)
(view_name, definition)

11.29

Silberschatz, Korth and Sudarshan

Figure 11.20

Database System Concepts - 5th Edition

11.30

Silberschatz, Korth and Sudarshan

Byte-String Representation of Variable-Length Records

Byte string representation


Attach an end-of-record () control character to the end of each record
Difficulty with deletion
Difficulty with growth

Database System Concepts - 5th Edition

11.31

Silberschatz, Korth and Sudarshan

Fixed-Length Representation
Use one or more fixed length records:

reserved space

pointers

Reserved space can use fixed-length records of a known

maximum length; unused space in shorter records filled with a null


or end-of-record symbol.

Database System Concepts - 5th Edition

11.32

Silberschatz, Korth and Sudarshan

Pointer Method

Pointer method

A variable-length record is represented by a list of fixed-length


records, chained together via pointers.

Can be used even if the maximum record length is not known

Database System Concepts - 5th Edition

11.33

Silberschatz, Korth and Sudarshan

Pointer Method (Cont.)


Disadvantage to pointer structure; space is wasted in all

records except the first in a a chain.

Solution is to allow two kinds of block in file:

Anchor block contains the first records of chain

Overflow block contains records other than those that


are the first records of chairs.

Database System Concepts - 5th Edition

11.34

Silberschatz, Korth and Sudarshan

File Organization
The physical arrangement of data in a file into records and

pages on the disk


File organization determines the set of access methods for

Storing and retrieving records from a file

Therefore, file organization synonymous with access method


We study three types of file organization
Unordered or Heap files
Ordered or sequential files
Hash files
We examine each of them in terms of the operations we

perform on the database

Insert a new record


Search for a record (or update a record)
Delete a record

Database System Concepts - 5th Edition

11.35

35
Dept. of Computing
Science, University of

Silberschatz, Korth and Sudarshan

Unordered Or Heap File


Records are stored in the same order in which they

are created
Insert operation

Fast because the incoming record is written at the end of


the last page of the file

Search (or update) operation

Slow because linear search is performed on pages

Delete Operation

Slow because the record to be deleted is first searched


for
Deleting the record creates a hole in the page
Periodic file compacting work required to reclaim the
wasted space

Database System Concepts - 5th Edition

11.36

36
Dept. of Computing
Science, University of

Silberschatz, Korth and Sudarshan

Ordered or Sequential File


Records are sorted on the values of one or more fields
Ordering field the field on which the records are sorted
Ordering key the key of the file when it is used for record sorting
Search (or update) Operation
Fast because binary search is performed on sorted records
Update the ordering field?
Delete Operation
Fast because searching the record is fast
Periodic file compacting work is, of course, required
Insert Operation
Poor because if we insert the new record in the correct position we need
to shift all the subsequent records in the file
Alternatively an overflow file is created which contains all the new records
as a heap
Periodically overflow file is merged with the main file
If overflow file is created search and delete operations for records in the
overflow file have to be linear!

Database System Concepts - 5th Edition

11.37

37
Dept. of Computing
Science, University of

Silberschatz, Korth and Sudarshan

Hash File
Is an array of buckets

Given a record, r a hash function, h(r) computes the index


of the bucket in which record r belongs
h uses one or more fields in the record called hash fields
Hash key - the key of the file when it is used by the hash
function

Example hash function

Assume that the staff last name is used as the hash field
Assume also that the hash file size is 26 buckets - each
bucket corresponding to each of the letters from the
alphabet
Then a hash function can be defined which computes the
bucket address (index) based on the first letter in the last
name.

Database System Concepts - 5th Edition

11.38

38
Dept. of Computing
Science, University of

Silberschatz, Korth and Sudarshan

Hash File (2)


Insert Operation
Fast because the hash function computes the
index of the bucket to which the record belongs
If

that bucket is full you go to the next free one

Search Operation
Fast because the hash function computes the
index of the bucket
Performance

may degrade if the record is not found in


the bucket suggested by hash function

Delete Operation
Fast once again for the same reason of hashing
function being able to locate the record quick

Database System Concepts - 5th Edition

11.39

39
Dept. of Computing
Science, University of

Silberschatz, Korth and Sudarshan

Chapter 12: Indexing and Hashing

Rev. Sep 17, 2008

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Chapter 12: Indexing and Hashing


Basic Concepts
Ordered Indices
B+-Tree Index Files
B-Tree Index Files
Static Hashing
Dynamic Hashing
Comparison of Ordered Indexing and Hashing
Index Definition in SQL
Multiple-Key Access

Database System Concepts - 5th Edition

11.41

Silberschatz, Korth and Sudarshan

Basic Concepts
Indexing mechanisms used to speed up access to desired data.

E.g., author catalog in library

Search Key - attribute to set of attributes used to look up

records in a file.

An index file consists of records (called index entries) of the

form

search-key

pointer

Index files are typically much smaller than the original file
Two basic kinds of indices:

Ordered indices: search keys are stored in sorted order

Hash indices: search keys are distributed uniformly across


buckets using a hash function.

Database System Concepts - 5th Edition

11.42

Silberschatz, Korth and Sudarshan

Index Evaluation Metrics


Access types supported efficiently. E.g.,

records with a specified value in the attribute

or records with an attribute value falling in a specified range


of values (e.g. 10000 < salary < 40000)

Access time
Insertion time
Deletion time
Space overhead

Database System Concepts - 5th Edition

11.43

Silberschatz, Korth and Sudarshan

A simple index
Index file
A-101
A-102
A-110
A-215
A-217
......

Brighton
Downtown
Downtown
Mianus
Perry

A-217
A-101
A-110
A-215
A-102
......

700
500
600
700
400

Index of depositors on acct_no


Index records: <search key value, pointer (block, offset or slot#)>
To answer a query for acct_no=A-110 we:
1. Do a binary search on index file, searching for A-110
2. Chase pointer of index record
Database System Concepts, 5th Ed.
Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Index Choices
1. Primary: index search key = physical order search key
vs Secondary: all other indexes
Q: how many primary indices per relation?
2. Dense: index entry for every search key value
vs Sparse: some search key values not in the index
3. Single level vs Multilevel (index on the indices)

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Measuringgoodness
Onwhatbasisdowecomparedifferentindices?
1.Accesstype:whattypeofqueriescanbeanswered:

selectionqueries(ssn=123)?

rangequeries(100<=ssn<=200)?

2.Accesstime:whatisthecostofevaluatingqueries

Measuredin#ofblockaccesses

3.Maintenanceoverhead:costofinsertion/deletion?
(alsoBAs)
4.Spaceoverhead:in#ofblocksneededtostorethe
index

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Indexing Structures for Files


Types of Single-level Ordered Indexes

Primary

Indexes

Clustering

Indexes

Secondary

Indexes

Multilevel Indexes
Dynamic Multilevel Indexes Using B-Trees

and B+-Trees

Indexes on Multiple Keys

Slide 9 -47
Database System Concepts - 5th Edition

11.47

Silberschatz, Korth and Sudarshan

Indexing

Primary (or clustering) index on SSN


123
234
345
456
567

STUDENT
Ssn
Name
123 smith
234 jones
345 smith

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Address
main str
forbes ave
forbes ave

Types of Single-Level Indexes


Primary Index

Defined on an ordered data file

The data file is ordered on a key field

Includes one index entry for each block in the data file;
the index entry has the key field value for the first record
in the block, which is called the block anchor

A similar scheme can use the last record in a block.

A primary index is a nondense (sparse) index, since it


includes an entry for each disk block of the data file and
the keys of its anchor record rather than for every
Slide 9 -49
search value.

Database System Concepts - 5th Edition

11.49

Silberschatz, Korth and Sudarshan

Slide 9 -50
Database System Concepts - 5th Edition

11.50

Silberschatz, Korth and Sudarshan

Types of Single-Level Indexes


Clustering Index

Defined on an ordered data file

The data file is ordered on a non-key field unlike primary


index, which requires that the ordering field of the data
file have a distinct value for each record.

Includes one index entry for each distinct value of the


field; the index entry points to the first data block that
contains records with that field value.

It is another example of nondense index where Insertion


and Deletion is relatively straightforward with a
Slide 9 -51
clustering index.
Database System Concepts - 5 Edition
11.51
Silberschatz, Korth and Sudarshan

th

Slide 9 -52
Database System Concepts - 5th Edition

11.52

Silberschatz, Korth and Sudarshan

Slide 9 -53
Database System Concepts - 5th Edition

11.53

Silberschatz, Korth and Sudarshan

Indexing
Secondary (or non-clustering) index:
duplicates may exist
Can have many secondary indices
but only one primary index

Address-index

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith
Database System Concepts, 5th Ed.
Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Address
main str
forbes ave
main str
forbes ave
forbes ave

Secondary Indexes
An index file that uses a non primary field as

an index e.g. City field in the branch table


They improve the performance of queries
that use attributes other than the primary key
You can use a separate index for every
attribute you wish to use in the WHERE
clause of your select query
But there is the overhead of maintaining a
large number of these indexes

Database System Concepts - 5th Edition

11.55

55
Dept. of Computing
Science, University of

Silberschatz, Korth and Sudarshan

Indexing
secondary index: typically, with postings lists

Postings lists

forbes ave
main str

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith
Database System Concepts, 5th Ed.
Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Address
main str
forbes ave
main str
forbes ave
forbes ave

Types of Single-Level Indexes


Secondary Index

A secondary index provides a secondary means of


accessing a file for which some primary access already
exists.

The secondary index may be on a field which is a


candidate key and has a unique value in every record, or
a nonkey with duplicate values.

The index is an ordered file with two fields.


The first field is of the same data type as some nonordering field of the
data file that is an indexing field.
The second field is either a block pointer or a record pointer. There can be
many secondary indexes (and hence, indexing fields) for the same file.

Includes one entry for each record in the data file; hence,
Slide 9 -57
it is a dense index

Database System Concepts - 5th Edition

11.57

Silberschatz, Korth and Sudarshan

A dense secondary index (with


block pointers) on a
nonordering key field of a file.

Slide 9 -58
Database System Concepts - 5th Edition

11.58

Silberschatz, Korth and Sudarshan

A secondary index (with recored pointers) on a


nonkey field implemented using one level of
indirection so that index entries are of fixed
length and have unique field values.

Slide 9 -59
Database System Concepts - 5th Edition

11.59

Silberschatz, Korth and Sudarshan

Slide 9 -60
Database System Concepts - 5th Edition

11.60

Silberschatz, Korth and Sudarshan

Indexes as Access Paths


A single-level index is an auxiliary file that makes it more efficient to search

for a record in the data file.

The index is usually specified on one field of the file (although it could be

specified on several fields)

One form of an index is a file of entries <field value, pointer to record>,

which is ordered by field value

The index is called an access path on the field.

Slide 9 -61
Database System Concepts - 5th Edition

11.61

Silberschatz, Korth and Sudarshan

Indexes as Access Paths (contd.)


The index file usually occupies considerably less disk blocks than the data

file because its entries are much smaller


A binary search on the index yields a pointer to the file record
Indexes can also be characterized as dense or sparse.

dense index has an index entry for every


search key value (and hence every record)
in the data file.
A sparse (or nondense) index, on the
other hand, has index entries for only some
of the search values

Slide 9 -62
Database System Concepts - 5th Edition

11.62

Silberschatz, Korth and Sudarshan

Indexing
Primary/sparse index on ssn (primary key)
123
456

>=123

>=456

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Indexing
Secondary / dense index
123
234
345
456
567

Secondary on a candidate key:


No duplicates, no need for posting lists

Ssn
345
234
123
567
456

Name
tomson
jones
smith
smith
stevens

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Address
main str
forbes ave
main str
forbes ave
forbes ave

Summary
All combinations are possible

Dens
e

Spars
e

Primary

rare

usual

secondar
y

usual

at most one sparse index


as many as desired dense indices
usually: one primary index (probably sparse) and a
few secondary indices (dense)

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Indexes as Access Paths (contd.)


Example: Given the following data file:
EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL, ... )
Suppose that:
record size R=150 bytes
block size B=512 bytes
r=30000 records
Then, we get:
blocking factor Bfr= B div R= 512 div 150= 3 records/block
number of file blocks b= (r/Bfr)= (30000/3)= 10000 blocks
For an index on the SSN field, assume the field size VSSN=9 bytes,
assume the record pointer size PR=7 bytes. Then:
index entry size RI=(VSSN+ PR)=(9+7)=16 bytes
index blocking factor BfrI= B div RI= 512 div 16= 32 entries/block
number of index blocks b= (r/ BfrI)= (30000/32)= 938 blocks
binary search needs log2bI= log2938= 10 block accesses
This is compared to an average linear search cost of:
(b/2)= 30000/2= 15000 block accesses
If the file records are ordered, the binary search cost would be:Slide 9 -66
logConcepts
30000= 15 block accesses
Database System
2b= log
- 5 2Edition
11.66
Silberschatz, Korth and Sudarshan
th

Creating indexes in SQL


You can create an index for every table you create in SQL
For example

CREATE INDEX branchNoIndex on branch(branchNo);

CREATE INDEX numberCityIndex on branch(branchNo,city);

DROP INDEX branchNoIndex;

Database System Concepts - 5th Edition

11.67

67
Dept. of Computing
Science, University of

Silberschatz, Korth and Sudarshan

Ordered Indices
In an ordered index, index entries are stored sorted on the search

key value. E.g., author catalog in library.

Primary index: in a sequentially ordered file, the index whose

search key specifies the sequential order of the file.

An ordering key is used to physically order all records.

Every record has a unique value for the field.

Clustering index : Ordering field is not a key field

Many records may have same value for this field.

The search key of a primary index is usually but not necessarily the

primary key.

Secondary index: an index whose search key specifies an order

different from the sequential order of the file. Also called


non-clustering index.

Index-sequential file: ordered sequential file with a primary index.


Database System Concepts - 5th Edition

11.68

Silberschatz, Korth and Sudarshan

Dense Index Files


Dense index Index record appears for every search-key

value in the file.

Database System Concepts - 5th Edition

11.69

Silberschatz, Korth and Sudarshan

Sparse Index Files


Sparse Index: contains index records for only some search-key values.

Applicable when records are sequentially ordered on search-key

To locate a record with search-key value K we:

Find index record with largest search-key value < K

Search file sequentially starting at the record to which the index


record points

Database System Concepts - 5th Edition

11.70

Silberschatz, Korth and Sudarshan

Sparse Index Files (Cont.)


Compared to dense indices:

Less space and less maintenance overhead for insertions and


deletions.

Generally slower than dense index for locating records.

Good tradeoff: sparse index with an index entry for every block in

file, corresponding to least search-key value in the block.

Database System Concepts - 5th Edition

11.71

Silberschatz, Korth and Sudarshan

Multilevel Index
If primary index does not fit in memory, access becomes

expensive.

Solution: treat primary index kept on disk as a sequential file

and construct a sparse index on it.

outer index a sparse index of primary index

inner index the primary index file

If even outer index is too large to fit in main memory, yet

another level of index can be created, and so on.

Indices at all levels must be updated on insertion or deletion

from the file.

Database System Concepts - 5th Edition

11.72

Silberschatz, Korth and Sudarshan

Multilevel Index (Cont.)

Database System Concepts - 5th Edition

11.73

Silberschatz, Korth and Sudarshan

Multi-Level Indexes
Because a single-level index is an ordered file, we can create a primary index

to the index itself ; in this case, the original index file is called the first-level
index and the index to the index is called the second-level index.

We can repeat the process, creating a third, fourth, ..., top level until all entries

of the top level fit in one disk block

A multi-level index can be created for any type of first-level index (primary,

secondary, clustering) as long as the first-level index consists of more than one
disk block

Slide 9 -74
Database System Concepts - 5th Edition

11.74

Silberschatz, Korth and Sudarshan

A two-level primary index


resembling ISAM (Indexed
Sequential Access Method)
organization

Slide 9 -75
Database System Concepts - 5th Edition

11.75

Silberschatz, Korth and Sudarshan

Dynamic Multi-Level Indexes


To retain the benefits of using multilevel indexing while reducing index insertion

and deletion problems

Leaves some space in each of its blocks for inserting new entries and uses

appropriate insertion/deletion algorithms for creating and deleting new index


blocks when the data file grows and shrinks.

Often implemented by using data structures called B-trees and B+-trees

Slide 9 -76
Database System Concepts - 5th Edition

11.76

Silberschatz, Korth and Sudarshan

Index Update: Record Deletion


If deleted record was the only record in the file with its particular search-

key value, the search-key is deleted from the index also.

Single-level index deletion:

Dense indices deletion of search-key: similar to file record deletion.

Sparse indices

if deleted key value exists in the index, the value is replaced by


the next search-key value in the file (in search-key order).
If the next search-key value already has an index entry, the entry
is deleted instead of being replaced.

Database System Concepts - 5th Edition

11.77

Silberschatz, Korth and Sudarshan

Index Update: Record Insertion


Single-level index insertion:

Perform a lookup using the key value from inserted record

Dense indices if the search-key value does not appear in


the index, insert it.

Sparse indices if index stores an entry for each block of


the file, no change needs to be made to the index unless a
new block is created.
If

a new block is created, the first search-key value


appearing in the new block is inserted into the index.

Multilevel insertion (as well as deletion) algorithms are simple

extensions of the single-level algorithms

Database System Concepts - 5th Edition

11.78

Silberschatz, Korth and Sudarshan

Secondary Indices Example

Secondary index on balance field of account


Index record points to a bucket that contains pointers to all the

actual records with that particular search-key value.

Secondary indices have to be dense


Database System Concepts - 5th Edition

11.79

Silberschatz, Korth and Sudarshan

Primary and Secondary Indices


Indices offer substantial benefits when searching for records.
BUT: Updating indices imposes overhead on database

modification --when a file is modified, every index on the file


must be updated,

Sequential scan using primary index is efficient, but a

sequential scan using a secondary index is expensive

Each record access may fetch a new block from disk

Block fetch requires about 5 to 10 micro seconds, versus


about 100 nanoseconds for memory access

Database System Concepts - 5th Edition

11.80

Silberschatz, Korth and Sudarshan

Indexed Sequential Access


Method

ISAM Indexed sequential access method is based on primary index


Default access method or table type in MySQL, MyISAM is an

extension of ISAM
Insert and delete operations disturb the sorting
You need an overflow file which periodically needs to be merged
with the main file

Database System Concepts - 5th Edition

11.81

81
Dept. of Computing
Science, University of

Silberschatz, Korth and Sudarshan

ISAM
What if index is too large to search in memory?
2nd level sparse index on
the values of the 1st level

>=123
123
3,423

123
456

>=456
block

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Address
main str
forbes ave
main str
forbes ave
forbes ave

Another example

22

[12, Mark, 3.5]


[16, John, 3]
[18, Azer, 3.6]
[21, Wayne, 3.4]

31

[22, Margrit 3.8]


[24, Stan, 3.7]
[26, Leonid, 4]
[27, Leo, 3.9]

42 49

[31, Peter, 3.5]


[33, Steve, 3]
[34, George, 3.99]
[35, Rich, 3.4]

[42, Scott, 2.9]


[44, Murat, 3.8]
[45, Ibrahim, 3.6]
[46, Gene, 3.4]

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

[49, Stan, 3.5]


[54, Mike, 3.89]
[56, Teo, 3.6]
[58, Jose, 3.4]

ISAMobservations

Whataboutinsertions/deletions?
>=123
123
3,423

123
456

>=456

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

System Concepts, 5th Ed.


124; peterson; Database
fifth ave.
Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Address
main str
forbes ave
main str
forbes ave
forbes ave

ISAMobservations

What about insertions/deletions?

123
3,423

123
456

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

overflows
Address
main str
forbes ave
main str
forbes ave
forbes ave

Problems?
Database System Concepts, 5th Ed.
Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

124; peterson; fifth ave.

ISAMobservations

Whataboutinsertions/deletions?

123
3,423

123
456

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

overflows
Address
main str
forbes ave
main str
forbes ave
forbes ave

124; peterson; fifth ave.

overflow chains may become very long - what to


do?
Database System Concepts, 5th Ed.
Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

ISAMobservations

Whataboutinsertions/deletions?

123
3,423

123
456

STUDENT
Ssn
Name
123 smith
234 jones
345 tomson
456 stevens
567 smith

overflows
Address
main str
forbes ave
main str
forbes ave
forbes ave

124; peterson; fifth ave.

overflow chains may become very long - thus:


shut-down & reorganize
start with
~80%System
utilization
Database
Concepts, 5th Ed.
Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

index entry
P
0

Tree
K 2

K m

Pm

Index file may still be quite large. But we can apply the idea repeatedly!

Non-leaf
Pages

Leaf
Pages
Overflow
page

Primary pages

Leaf pages contain data entries.


Database System Concepts - 5th Edition

11.89

Silberschatz, Korth and Sudarshan

Search Tree
A search tree of order p is a tree such that each node contains at most p 1

search values and p pointers in the order <P1, K1, P2, K2, ..., Pq1, Kq1, Pq>,
where q p. Each Pi is a pointer to a child node (or a NULL pointer), and each Ki
is a search value from some ordered set of values. All search values are assumed
to be unique

Slide 9 -90
Database System Concepts - 5th Edition

11.90

Silberschatz, Korth and Sudarshan

A search tree of order p = 3.

Slide 9 -91
Database System Concepts - 5th Edition

11.91

Silberschatz, Korth and Sudarshan

Dynamic Multilevel Indexes Using B-Trees


and B+-Trees
Because of the insertion and deletion problem, most multi-level indexes use B-

tree or B+-tree data structures, which leave space in each tree node (disk
block) to allow for new index entries

These data structures are variations of search trees that allow efficient

insertion and deletion of new search values.

In B-Tree and B+-Tree data structures, each node corresponds to a disk block
Each node is kept between half-full and completely full

Slide 9 -92
Database System Concepts - 5th Edition

11.92

Silberschatz, Korth and Sudarshan

Dynamic Multilevel Indexes Using B-Trees


and B+-Trees (contd.)
An insertion into a node that is not full is quite efficient; if a node is full the

insertion causes a split into two nodes

Splitting may propagate to other tree levels


A deletion is quite efficient if a node does not become less than half full
If a deletion causes a node to become less than half full, it must be merged

with neighboring nodes

Slide 9 -93
Database System Concepts - 5th Edition

11.93

Silberschatz, Korth and Sudarshan

Difference between B-tree and B+-tree

In a B-tree, pointers to data records exist at all levels of the tree

In a B+-tree, all pointers to data records exists at the leaf-level nodes

A B+-tree can have less levels (or higher capacity of search values) than the
corresponding B-tree

Slide 9 -94
Database System Concepts - 5th Edition

11.94

Silberschatz, Korth and Sudarshan

Slide 9 -95
Database System Concepts - 5th Edition

11.95

Silberschatz, Korth and Sudarshan

Slide 9 -96
Database System Concepts - 5th Edition

11.96

Silberschatz, Korth and Sudarshan

B+ Tree: The Most Widely Used Index

Insert/delete at log

leaf pages)

N cost; keep tree height-balanced. (F = fanout, N = #

Minimum 50% occupancy (except for root). Each node contains d <= m <=

2d entries. The parameter d is called the order of the tree.

Supports equality and range-searches efficiently.

Index Entries
(Direct search)

Data Entries
("Sequence set")
Database System Concepts - 5th Edition

11.97

Silberschatz, Korth and Sudarshan

Example B+ Tree
Search begins at root, and key comparisons direct it to a leaf (as in

ISAM).

Search for 5*, 15*, all data entries >= 24* ...

Root
13

2*

3*

5*

7*

14* 16*

17

24

19* 20* 22*

30

24* 27* 29*

33* 34* 38* 39*

Based on the search for 15*, we know it is not in the tree!


Database System Concepts - 5th Edition

11.98

Silberschatz, Korth and Sudarshan

B+ Trees in Practice
Typical order: 100. Typical fill-factor: 67%.

average fanout = 133

Typical capacities:

Height 4: 1334 = 312,900,700 records

Height 3: 1333 =

2,352,637 records

Can often hold top levels in buffer pool:

Level 1 =

1 page =

Level 2 =

133 pages =

Level 3 = 17,689 pages = 133 MBytes

Database System Concepts - 5th Edition

11.99

8 Kbytes
1 Mbyte

Silberschatz, Korth and Sudarshan

Inserting a Data Entry into a B+ Tree

Find correct leaf L.

Put data entry onto L.

If L has enough space, done!

Else, must split L (into L and a new node L2)

Redistribute entries evenly, copy up middle key.

Insert index entry pointing to L2 into parent of L.

This can happen recursively

To split index node, redistribute entries evenly, but push up middle key.
(Contrast with leaf splits.)

Splits grow tree; root split increases height.

Tree growth: gets wider or one level taller at top.

Database System Concepts - 5th Edition

11.100

Silberschatz, Korth and Sudarshan

Example B+ Tree - Inserting 8*

Root
13

2*

3*

5*

7*

14* 16*

Database System Concepts - 5th Edition

17

24

19* 20* 22*

11.101

30

24* 27* 29*

33* 34* 38* 39*

Silberschatz, Korth and Sudarshan

Example B+ Tree - Inserting 8*

Root
17

2* 3*

24

13

5* 7* 8*

19*20* 22*

14* 16*

30

24* 27* 29*

33* 34* 38* 39*

Notice that root was split, leading to increase in height.


In this example, we can avoid split by re-distributing
entries; however, this is usually not done in practice.
Database System Concepts - 5th Edition

11.102

Silberschatz, Korth and Sudarshan

Inserting 8* into Example B+ Tree


Observe how

Entry to be inserted in parent node.


(Note that 5 is
s copied up and
continues to appear in the leaf.)

minimum
occupancy is
guaranteed in both
leaf and index pg
splits.

2*

5*

3*

7*

8*

Note difference

between copy-up
and push-up; be
sure you
understand the
reasons for this.

Database System Concepts - 5th Edition

17

24

13

11.103

Entry to be inserted in parent node.


(Note that 17 is pushed up and only
appears once in the index. Contrast
this with a leaf split.)

30

Silberschatz, Korth and Sudarshan

Deleting a Data Entry from a B+ Tree


Start at root, find leaf L where entry belongs.
Remove the entry.

If L is at least half-full, done!

If L has only d-1 entries,


Try

to re-distribute, borrowing from sibling (adjacent


node with same parent as L).

If

re-distribution fails, merge L and sibling.

If merge occurred, must delete entry (pointing to L or sibling) from parent of L.


Merge could propagate to root, decreasing height.

Database System Concepts - 5th Edition

11.104

Silberschatz, Korth and Sudarshan

Example Tree (including 8*)


Delete 19* and 20* ...
Root
17

2* 3*

24

13

5* 7* 8*

14* 16*

19*20* 22*

30

24* 27* 29*

33* 34* 38* 39*

Deleting 19* is easy.

Database System Concepts - 5th Edition

11.105

Silberschatz, Korth and Sudarshan

Example Tree (including 8*)


Delete 19* and 20* ...
Root
17

2* 3*

27

13

5* 7* 8*

14* 16*

22*24*

30

27* 29*

33* 34* 38* 39*

Deleting 19* is easy.


Deleting 20* is done with re-distribution. Notice how middle key is copied

up.

Database System Concepts - 5th Edition

11.106

Silberschatz, Korth and Sudarshan

... And Then Deleting 24*


Must merge.
Observe `toss of index entry (on

30

right), and `pull down of index


entry (below).

22*

27*

29*

33*

34*

38*

39*

Root
5

2*

3*

5*

7*

Database System Concepts - 5th Edition

8*

13

17

30

22* 27* 29*

14* 16*

11.107

33* 34* 38* 39*

Silberschatz, Korth and Sudarshan

B+-Tree Index Files


B+-tree indices are an alternative to indexed-sequential files.
Disadvantage of indexed-sequential files

performance degrades as file grows, since many overflow


blocks get created.
Periodic reorganization of entire file is required.
Advantage of B+-tree index files:
automatically reorganizes itself with small, local, changes,
in the face of insertions and deletions.
Reorganization of entire file is not required to maintain
performance.
(Minor) disadvantage of B+-trees:
extra insertion and deletion overhead, space overhead.
Advantages of B+-trees outweigh disadvantages
B+-trees are used extensively

Database System Concepts - 5th Edition

11.108

Silberschatz, Korth and Sudarshan

B+-Tree Index Files (Cont.)


A B+-tree is a rooted tree satisfying the following properties:
All paths from root to leaf are of the same length
Each node that is not a root or a leaf has between n/2 and n

children.

A leaf node has between (n1)/2 and n1 values


Special cases:

If the root is not a leaf, it has at least 2 children.

If the root is a leaf (that is, there are no other nodes in the tree),
it can have between 0 and (n1) values.

Database System Concepts - 5th Edition

11.109

Silberschatz, Korth and Sudarshan

B+-Tree Node Structure


Typical node

Ki are the search-key values


Pi are pointers to children (for non-leaf nodes) or pointers to
records or buckets of records (for leaf nodes).

The search-keys in a node are ordered

K1 < K2 < K3 < . . . < Kn1

Database System Concepts - 5th Edition

11.110

Silberschatz, Korth and Sudarshan

Leaf Nodes in B+-Trees


Properties of a leaf node:
For i = 1, 2, . . ., n1, pointer Pi either points to a file record with

search-key value Ki, or to a bucket of pointers to file records,


each record having search-key value Ki. Only need bucket
structure if search-key does not form a primary key.

If Li, Lj are leaf nodes and i < j, Lis search-key values are less

than Ljs search-key values

Pn points to next leaf node in search-key order

Database System Concepts - 5th Edition

11.111

Silberschatz, Korth and Sudarshan

Non-Leaf Nodes in B+-Trees


Non leaf nodes form a multi-level sparse index on the leaf

nodes. For a non-leaf node with m pointers:

All the search-keys in the subtree to which P1 points are


less than K1
For 2 i n 1, all the search-keys in the subtree to which
Pi points have values greater than or equal to Ki1 and less
than Ki
All the search-keys in the subtree to which Pn points have
values greater than or equal to Kn1

Database System Concepts - 5th Edition

11.112

Silberschatz, Korth and Sudarshan

Example of a B+-tree

B+-tree for account file (n = 3)

Database System Concepts - 5th Edition

11.113

Silberschatz, Korth and Sudarshan

Example of B+-tree

B+-tree for account file (n = 5)


Leaf nodes must have between 2 and 4 values

( (n1)/2 and n 1, with n = 5).

Non-leaf nodes other than root must have between 3

and 5 children ( (n/2 and n with n =5).

Root must have at least 2 children.

Database System Concepts - 5th Edition

11.114

Silberschatz, Korth and Sudarshan

Observations about B+-trees


Since the inter-node connections are done by pointers,

logically close blocks need not be physically close.

The non-leaf levels of the B+-tree form a hierarchy of sparse

indices.

The B+-tree contains a relatively small number of levels


Level
Next
..

below root has at least 2* n/2 values

level has at least 2* n/2 * n/2 values

etc.

If there are K search-key values in the file, the tree height is


no more than log n/2 (K)

thus searches can be conducted efficiently.

Insertions and deletions to the main file can be handled

efficiently, as the index can be restructured in logarithmic time


(as we shall see).

Database System Concepts - 5th Edition

11.115

Silberschatz, Korth and Sudarshan

Queries on B+-Trees

Find all records with a search-key value of k.


1.

N=root

2.

Repeat
1.

Examine N for the smallest search-key value > k.

2.

If such a value exists, assume it is Ki. Then set N = Pi

3.

Otherwise k Kn1. Set N = Pn

Until N is a leaf node


3.

If for some i, key Ki = k follow pointer Pi to the desired record or bucket.

4.

Else no record with search-key value k exists.

Database System Concepts - 5th Edition

11.116

Silberschatz, Korth and Sudarshan

Queries on B+-Trees (Cont.)


If there are K search-key values in the file, the height of the

tree is no more than log n/2 (K) .

A node is generally the same size as a disk block, typically 4

kilobytes

and n is typically around 100 (40 bytes per index entry).

With 1 million search key values and n = 100

at most log50(1,000,000) = 4 nodes are accessed in a


lookup.

Contrast this with a balanced binary tree with 1 million search

key values around 20 nodes are accessed in a lookup

above difference is significant since every node access


may need a disk I/O, costing around 20 milliseconds

Database System Concepts - 5th Edition

11.117

Silberschatz, Korth and Sudarshan

Updates on B+-Trees: Insertion


1. Find the leaf node in which the search-key value would appear
2. If the search-key value is already present in the leaf node
1.

Add record to the file

3. If the search-key value is not present, then


1.

add the record to the main file (and create a bucket if


necessary)

2.

If there is room in the leaf node, insert (key-value, pointer)


pair in the leaf node

3.

Otherwise, split the node (along with the new (key-value,


pointer) entry) as discussed in the next slide.

Database System Concepts - 5th Edition

11.118

Silberschatz, Korth and Sudarshan

Updates on B+-Trees: Insertion (Cont.)


Splitting a leaf node:

take the n (search-key value, pointer) pairs (including the one


being inserted) in sorted order. Place the first n/2 in the original
node, and the rest in a new node.

let the new node be p, and let k be the least key value in p. Insert
(k,p) in the parent of the node being split.

If the parent is full, split it and propagate the split further up.

Splitting of nodes proceeds upwards till a node that is not full is found.

In the worst case the root node may be split increasing the height
of the tree by 1.

Result of splitting node containing Brighton and Downtown on inserting


Clearview
Next step: insert entry with (Downtown,pointer-to-new-node) into parent

Database System Concepts - 5th Edition

11.119

Silberschatz, Korth and Sudarshan

Updates on B+-Trees: Insertion (Cont.)

B+-Tree before and after insertion of Clearview


Database System Concepts - 5th Edition

11.120

Silberschatz, Korth and Sudarshan

Insertion in B+-Trees (Cont.)


Splitting a non-leaf node: when inserting (k,p) into an already

full internal node N

Copy N to an in-memory area M with space for n+1 pointers


and n keys

Insert (k,p) into M

Copy P1,K1, , K n/2 -1,P n/2 from M back into node N

Copy P n/2 +1,K n/2 +1,,Kn,Pn+1 from M into newly allocated


node N
Insert (K n/2 ,N) into parent N

Read pseudocode in book!

Mianus
Downtown Mianus Perryridge

Database System Concepts - 5th Edition

Downtown
11.121

Redwood
Silberschatz, Korth and Sudarshan

Updates on B+-Trees: Deletion


Find the record to be deleted, and remove it from the main file

and from the bucket (if present)

Remove (search-key value, pointer) from the leaf node if there

is no bucket or if the bucket has become empty

If the node has too few entries due to the removal, and the

entries in the node and a sibling fit into a single node, then
merge siblings:

Insert all the search-key values in the two nodes into a


single node (the one on the left), and delete the other node.
Delete the pair (Ki1, Pi), where Pi is the pointer to the
deleted node, from its parent, recursively using the above
procedure.

Database System Concepts - 5th Edition

11.122

Silberschatz, Korth and Sudarshan

Updates on B+-Trees: Deletion


Otherwise, if the node has too few entries due to the removal,

but the entries in the node and a sibling do not fit into a single
node, then redistribute pointers:

Redistribute the pointers between the node and a sibling


such that both have more than the minimum number of
entries.

Update the corresponding search-key value in the parent of


the node.

The node deletions may cascade upwards till a node which has

n/2 or more pointers is found.

If the root node has only one pointer after deletion, it is deleted

and the sole child becomes the root.

Database System Concepts - 5th Edition

11.123

Silberschatz, Korth and Sudarshan

Examples of B+-Tree Deletion

Before and after deleting Downtown


Deleting Downtown causes merging of under-full leaves

leaf node can become empty only for n=3!

Database System Concepts - 5th Edition

11.124

Silberschatz, Korth and Sudarshan

Examples of B+-Tree Deletion (Cont.)

Before and After deletion of Perryridge from result of


previous example
Database System Concepts - 5th Edition

11.125

Silberschatz, Korth and Sudarshan

Examples of B+-Tree Deletion (Cont.)

Leaf with Perryridge becomes underfull (actually empty, in this

special case) and merged with its sibling.


As a result Perryridge nodes parent became underfull, and was
merged with its sibling
Value separating two nodes (at parent) moves into merged node
Entry deleted from parent
Root node then has only one child, and is deleted

Database System Concepts - 5th Edition

11.126

Silberschatz, Korth and Sudarshan

Example of B+-tree Deletion (Cont.)

Before and after deletion of Perryridge from earlier example


Parent of leaf containing Perryridge became underfull, and borrowed a

pointer from its left sibling

Search-key value in the parents parent changes as a result


Database System Concepts - 5th Edition

11.127

Silberschatz, Korth and Sudarshan

B+-Tree File Organization


Index file degradation problem is solved by using B+-Tree indices.
Data file degradation problem is solved by using B+-Tree File

Organization.

The leaf nodes in a B+-tree file organization store records, instead

of pointers.

Leaf nodes are still required to be half full

Since records are larger than pointers, the maximum number


of records that can be stored in a leaf node is less than the
number of pointers in a nonleaf node.

Insertion and deletion are handled in the same way as insertion

and deletion of entries in a B+-tree index.

Database System Concepts - 5th Edition

11.128

Silberschatz, Korth and Sudarshan

B+-Tree File Organization (Cont.)

Example of B+-tree File Organization


Good space utilization important since records use more space than

pointers.

To improve space utilization, involve more sibling nodes in

redistribution during splits and merges

Involving 2 siblings in redistribution (to avoid split / merge where


possible) results in each node having at least 2n / 3 entries

Database System Concepts - 5th Edition

11.129

Silberschatz, Korth and Sudarshan

Example of Non-leaf Re-distribution

Tree is shown below during deletion of 24*. (What could be a possible initial

tree?)

In contrast to previous example, can re-distribute entry from left child of root

to right child.

Root
22

2* 3*

5* 7* 8*

Database System Concepts - 5th Edition

13

14* 16*

17

30

20

17* 18*
11.130

20* 21*

22* 27* 29*

33* 34* 38* 39*

Silberschatz, Korth and Sudarshan

After Re-distribution

Intuitively, entries are re-distributed by `pushing through the splitting entry in

the parent node.

It suffices to re-distribute index entry with key 20 through the parent (move 22

down and move 20 to the parent)

Root

20

2* 3*

5* 7* 8*

Database System Concepts - 5th Edition

13

14* 16*

17

22

17* 18*
11.131

20* 21*

30

22* 27* 29*

33* 34* 38* 39*

Silberschatz, Korth and Sudarshan

Prefix Key Compression


Important to increase fan-out. (Why?)
Key values in index entries only `direct traffic; can often compress them.

E.g., If we have adjacent index entries with search key values Dannon
Yogurt, David Smith and Devarakonda Murthy, we can abbreviate David
Smith to Dav. (The other keys can be compressed too ...)

Is this correct? Not quite! What if there is a data entry Davey


Jones? (Can only compress David Smith to Davi)

In general, while compressing, must leave each index entry greater


than every key value (in any subtree) to its left.

Insert/delete must be suitably modified.

Database System Concepts - 5th Edition

11.132

Silberschatz, Korth and Sudarshan

Bulk Loading of a B+ Tree


If we have a large collection of records, and we want to create a B+ tree on

some field, doing so by repeatedly inserting records is very slow.

Also leads to minimal leaf utilization --- why?

Bulk Loading can be done much more efficiently.


Initialization: Sort all data entries, insert pointer to first (leaf) page in a new

(root) page.

Root

3* 4*

Sorted pages of data entries; not yet in B+ tree

6* 9*

Database System Concepts - 5th Edition

10* 11*

12* 13* 20* 22* 23* 31* 35* 36*

11.133

38* 41* 44*

Silberschatz, Korth and Sudarshan

Bulk Loading (Contd.)


Root

10

20

Index entries for leaf

pages always entered


into right-most index
page just above leaf
level. When this fills up,
it splits. (Split may go up3*
right-most path to the
root.)

12

4*

6* 9*

repeated inserts,
especially when one
considers locking!

Database System Concepts - 5 Edition

6* 9*

not yet in B+ tree

20

10

3* 4*
th

Data entry pages

35

10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*

Root

Much faster than

23

12

Data entry pages


not yet in B+ tree

35

23

38

10* 11* 12* 13* 20*22* 23* 31* 35* 36* 38*41* 44*
11.134

Silberschatz, Korth and Sudarshan

Summary of Bulk Loading


Option 1: multiple inserts.

Slow.

Does not give sequential storage of leaves.

Option 2: Bulk Loading

Has advantages for concurrency control.

Fewer I/Os during build.

Leaves will be stored sequentially (and linked, of course).

Can control fill factor on pages.

Database System Concepts - 5th Edition

11.135

Silberschatz, Korth and Sudarshan

A Note on `Order
Order (d) concept replaced by physical space criterion in practice (`at least

half-full).

Index pages can typically hold many more entries than leaf pages.

Variable sized records and search keys mean different nodes will contain
different numbers of entries.

Even with fixed length fields, multiple records with the same search key
value (duplicates) can lead to variable-sized data entries (if we use
Alternative (3)).

Many real systems are even sloppier than this --- only reclaim space when a

page is completely empty.

Database System Concepts - 5th Edition

11.136

Silberschatz, Korth and Sudarshan

Alternatives for Data Entry k* in Index


Three alternatives:
Actual data record (with key value k)
<k, rid of matching data record>
<k, list of rids of matching data records >
Choice is orthogonal to the indexing technique.
Examples of indexing techniques: B+ trees, hash-based structures, R trees,
Typically, index contains auxiliary information that directs searches to the
desired data entries

Database System Concepts - 5th Edition

11.137

Silberschatz, Korth and Sudarshan

Alternatives for Data Entries (Contd.)


Alternative 1:

Actual data record (with key value k)

If this is used, index structure is a file organization for data records (like Heap
files or sorted files).

At most one index on a given collection of data records can use Alternative 1.

This alternative saves pointer lookups but can be expensive to maintain with
insertions and deletions.

Database System Concepts - 5th Edition

11.138

Silberschatz, Korth and Sudarshan

Alternatives for Data Entries (Contd.)


Alternative 2
<k, rid of matching data record>
and Alternative 3
<k, list of rids of matching data records>
Easier to maintain than Alt 1.
If more than one index is required on a given file, at most one index can use
Alternative 1; rest must use Alternatives 2 or 3.
Alternative 3 more compact than Alternative 2, but leads to variable sized data
entries even if search keys are of fixed length.
Even worse, for large rid lists the data entry would have to span multiple
blocks!

Database System Concepts - 5th Edition

11.139

Silberschatz, Korth and Sudarshan

B-Tree Index Files


Similar to B+-tree, but B-tree allows search-key values

to appear only once; eliminates redundant storage of


search keys.

Search keys in nonleaf nodes appear nowhere else in

the B-tree; an additional pointer field for each search


key in a nonleaf node must be included.

Generalized B-tree leaf node

Nonleaf node pointers Bi are the bucket or file record

pointers.

Database System Concepts - 5th Edition

11.140

Silberschatz, Korth and Sudarshan

B-Tree Index File Example

B-tree (above) and B+-tree (below) on same data

Database System Concepts - 5th Edition

11.141

Silberschatz, Korth and Sudarshan

B-Tree Index Files (Cont.)


Advantages of B-Tree indices:

May use less tree nodes than a corresponding B+-Tree.

Sometimes possible to find search-key value before reaching


leaf node.

Disadvantages of B-Tree indices:

Only small fraction of all search-key values are found early

Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees


typically have greater depth than corresponding B+-Tree

Insertion and deletion more complicated than in B+-Trees

Implementation is harder than B+-Trees.

Typically, advantages of B-Trees do not out weigh disadvantages.

Database System Concepts - 5th Edition

11.142

Silberschatz, Korth and Sudarshan

Indexing Strings
Variable length strings as keys

Variable fanout

Use space utilization as criterion for splitting, not number of


pointers

Prefix compression

Key values at internal nodes can be prefixes of full key


Keep

enough characters to distinguish entries in the


subtrees separated by the key value
E.g. Silas and Silberschatz can be separated by Silb

Keys in leaf node can be compressed by sharing common


prefixes

Database System Concepts - 5th Edition

11.143

Silberschatz, Korth and Sudarshan

Multiple-Key Access
Use multiple indices for certain types of queries.
Example:

select account_number
from account
where branch_name = Perryridge and balance = 1000
Possible strategies for processing query using indices on single
attributes:
1. Use index on branch_name to find accounts with branch
name Perryridge; test balance = 1000
2. Use index on balance to find accounts with balances of
$1000; test branch_name = Perryridge.
3. Use branch_name index to find pointers to all records
pertaining to the Perryridge branch. Similarly use index on
balance. Take intersection of both sets of pointers obtained.
Database System Concepts - 5th Edition

11.144

Silberschatz, Korth and Sudarshan

Indices on Multiple Keys


Composite search keys are search keys containing more than

one attribute

E.g. (branch_name, balance)

Lexicographic ordering: (a1, a2) < (b1, b2) if either

a1 < b1, or

a1=b1 and a2 < b2

Database System Concepts - 5th Edition

11.145

Silberschatz, Korth and Sudarshan

Indexes on Multiple Keys


Ordered Index on Multiple Attributes

Create a index on a search key field that is a combination


of multiple attributes

A lexicographic ordering of these tuple values establishes


an order on this composite search key

Partitioned Hashing

For a key consisting of n components, the hash function is


designed to produce a result with n separate hash
addresses

Grid Files

Construct a grid array with one linear scale (or dimension)


for each of the search attributes
Slide 9 -146

Database System Concepts - 5th Edition

11.146

Silberschatz, Korth and Sudarshan

Slide 9 -147
Database System Concepts - 5th Edition

11.147

Silberschatz, Korth and Sudarshan

Other Types of Indexes


Hash Indexes
Bitmap Indexes
Function based Indexing

(at home !!!)

Slide 9 -148
Database System Concepts - 5th Edition

11.148

Silberschatz, Korth and Sudarshan

Indices on Multiple Attributes


Suppose we have an index on combined search-key
(branch_name, balance).

For

where branch_name = Perryridge and balance = 1000


the index on (branch_name, balance) can be used to fetch only
records that satisfy both conditions.

Using separate indices in less efficient we may fetch many


records (or pointers) that satisfy only one of the conditions.

Can also efficiently handle

where branch_name = Perryridge and balance < 1000

But cannot efficiently handle

where branch_name < Perryridge and balance = 1000

May fetch many records that satisfy the first but not the
second condition

Database System Concepts - 5th Edition

11.149

Silberschatz, Korth and Sudarshan

Non-Unique Search Keys


Alternatives:

Buckets on separate block (bad idea)

List of tuple pointers with each key


Low

space overhead, no extra cost for queries

Extra

code to handle read/update of long lists

Deletion

of a tuple can be expensive if there are many


duplicates on search key (why?)

Make search key unique by adding a record-identifier


Extra

storage overhead for keys

Simpler
Widely

Database System Concepts - 5th Edition

code for insertion/deletion

used

11.150

Silberschatz, Korth and Sudarshan

Other Issues in Indexing


Covering indices

Add extra attributes to index so (some) queries can avoid


fetching the actual records
Particularly useful for secondary indices
Why?
Can store extra attributes only at leaf
Record relocation and secondary indices
If a record moves, all secondary indices that store record
pointers have to be updated
Node splits in B+-tree file organizations become very expensive
Solution: use primary-index search key instead of record
pointer in secondary index
Extra traversal of primary index to locate record
Higher cost for queries, but node splits are cheap
Add record-id if primary-index search key is non-unique

Database System Concepts - 5th Edition

11.151

Silberschatz, Korth and Sudarshan

Hashing

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Static Hashing
A bucket is a unit of storage containing one or more records (a

bucket is typically a disk block).

In a hash file organization we obtain the bucket of a record directly

from its search-key value using a hash function.

Hash function h is a function from the set of all search-key values K

to the set of all bucket addresses B.

Hash function is used to locate records for access, insertion as well

as deletion.

Records with different search-key values may be mapped to the

same bucket; thus entire bucket has to be searched sequentially to


locate a record.

Database System Concepts - 5th Edition

11.153

Silberschatz, Korth and Sudarshan

Example of Hash File Organization


Hash file organization of account file, using branch_name as key
(See figure in next slide.)
There are 10 buckets,
The binary representation of the ith character is assumed to be the

integer i.

The hash function returns the sum of the binary representations of

the characters modulo 10

E.g. h(Perryridge) = 5

Database System Concepts - 5th Edition

h(Round Hill) = 3 h(Brighton) = 3

11.154

Silberschatz, Korth and Sudarshan

Example of Hash File Organization


Hash file organization
of account file, using
branch_name as key
(see previous slide for
details).

Database System Concepts - 5th Edition

11.155

Silberschatz, Korth and Sudarshan

Hash Functions
Worst hash function maps all search-key values to the same bucket;

this makes access time proportional to the number of search-key values


in the file.

An ideal hash function is uniform, i.e., each bucket is assigned the

same number of search-key values from the set of all possible values.

Ideal hash function is random, so each bucket will have the same

number of records assigned to it irrespective of the actual distribution of


search-key values in the file.

Typical hash functions perform computation on the internal binary

representation of the search-key.

For example, for a string search-key, the binary representations of


all the characters in the string could be added and the sum modulo
the number of buckets could be returned. .

Database System Concepts - 5th Edition

11.156

Silberschatz, Korth and Sudarshan

Handling of Bucket Overflows


Bucket overflow can occur because of

Insufficient buckets

Skew in distribution of records. This can occur due to two


reasons:

multiple records have same search-key value


chosen hash function produces non-uniform distribution of key
values

Although the probability of bucket overflow can be reduced, it cannot

be eliminated; it is handled by using overflow buckets.

Database System Concepts - 5th Edition

11.157

Silberschatz, Korth and Sudarshan

Handling of Bucket Overflows (Cont.)


Overflow chaining the overflow buckets of a given bucket are chained

together in a linked list.

Above scheme is called closed hashing.

An alternative, called open hashing, which does not use overflow


buckets, is not suitable for database applications.

Database System Concepts - 5th Edition

11.158

Silberschatz, Korth and Sudarshan

Hash Indices
Hashing can be used not only for file organization, but also for index-

structure creation.

A hash index organizes the search keys, with their associated record

pointers, into a hash file structure.

Strictly speaking, hash indices are always secondary indices

if the file itself is organized using hashing, a separate primary hash


index on it using the same search-key is unnecessary.

However, we use the term hash index to refer to both secondary


index structures and hash organized files.

Database System Concepts - 5th Edition

11.159

Silberschatz, Korth and Sudarshan

Example of Hash Index

Database System Concepts - 5th Edition

11.160

Silberschatz, Korth and Sudarshan

Deficiencies of Static Hashing


In static hashing, function h maps search-key values to a fixed set of B

of bucket addresses. Databases grow or shrink with time.

If initial number of buckets is too small, and file grows, performance


will degrade due to too much overflows.

If space is allocated for anticipated growth, a significant amount of


space will be wasted initially (and buckets will be underfull).

If database shrinks, again space will be wasted.

One solution: periodic re-organization of the file with a new hash

function

Expensive, disrupts normal operations

Better solution: allow the number of buckets to be modified dynamically.

Database System Concepts - 5th Edition

11.161

Silberschatz, Korth and Sudarshan

Dynamic Hashing
Good for database that grows and shrinks in size
Allows the hash function to be modified dynamically

Extendable hashing one form of dynamic hashing

Hash function generates values over a large range typically b-bit


integers, with b = 32.
At any time use only a prefix of the hash function to index into a
table of bucket addresses.
Let the length of the prefix be i bits, 0 i 32.

Bucket address table size = 2i. Initially i = 0


Value of i grows and shrinks as the size of the database grows
and shrinks.
Multiple entries in the bucket address table may point to a bucket
(why?)

Thus, actual number of buckets is < 2i


The number of buckets also changes dynamically due to
coalescing and splitting of buckets.

Database System Concepts - 5th Edition

11.162

Silberschatz, Korth and Sudarshan

General Extendable Hash Structure

In this structure, i2 = i3 = i, whereas i1 = i 1 (see next


slide for details)
Database System Concepts - 5th Edition

11.163

Silberschatz, Korth and Sudarshan

Use of Extendable Hash Structure


Each bucket j stores a value ij

All the entries that point to the same bucket have the same values on
the first ij bits.

To locate the bucket containing search-key Kj:

1. Compute h(Kj) = X
2. Use the first i high order bits of X as a displacement into bucket
address table, and follow the pointer to appropriate bucket
To insert a record with search-key value Kj

follow same procedure as look-up and locate the bucket, say j.

If there is room in the bucket j insert record in the bucket.

Else the bucket must be split and insertion re-attempted (next slide.)

Overflow buckets used instead in some cases (will see shortly)

Database System Concepts - 5th Edition

11.164

Silberschatz, Korth and Sudarshan

Insertion in Extendable Hash Structure (Cont)


To split a bucket j when inserting record with search-key value Kj:
If i > ij (more than one pointer to bucket j)

allocate a new bucket z, and set ij = iz = (ij + 1)

Update the second half of the bucket address table entries originally
pointing to j, to point to z
remove each record in bucket j and reinsert (in j or z)
recompute new bucket for Kj and insert record in the bucket (further
splitting is required if the bucket is still full)
If i = ij (only one pointer to bucket j)

If i reaches some limit b, or too many splits have happened in this


insertion, create an overflow bucket
Else
increment i and double the size of the bucket address table.
replace each entry in the table by two entries that point to the
same bucket.
recompute new bucket address table entry for Kj
Now i > ij so use the first case above.

Database System Concepts - 5th Edition

11.165

Silberschatz, Korth and Sudarshan

Deletion in Extendable Hash Structure


To delete a key value,

locate it in its bucket and remove it.

The bucket itself can be removed if it becomes empty (with


appropriate updates to the bucket address table).

Coalescing of buckets can be done (can coalesce only with a


buddy bucket having same value of ij and same ij 1 prefix, if it is
present)

Decreasing bucket address table size is also possible

Note: decreasing bucket address table size is an expensive


operation and should be done only if number of buckets becomes
much smaller than the size of the table

Database System Concepts - 5th Edition

11.166

Silberschatz, Korth and Sudarshan

Use of Extendable Hash Structure:


Example

Initial Hash structure, bucket size = 2


Database System Concepts - 5th Edition

11.167

Silberschatz, Korth and Sudarshan

Example (Cont.)
Hash structure after insertion of one Brighton and two Downtown

records

Database System Concepts - 5th Edition

11.168

Silberschatz, Korth and Sudarshan

Example (Cont.)
Hash structure after insertion of Mianus record

Database System Concepts - 5th Edition

11.169

Silberschatz, Korth and Sudarshan

Example (Cont.)

Hash structure after insertion of three Perryridge records


Database System Concepts - 5th Edition

11.170

Silberschatz, Korth and Sudarshan

Example (Cont.)
Hash structure after insertion of Redwood and Round Hill records

Database System Concepts - 5th Edition

11.171

Silberschatz, Korth and Sudarshan

Extendable Hashing vs. Other Schemes


Benefits of extendable hashing:

Hash performance does not degrade with growth of file


Minimal space overhead
Disadvantages of extendable hashing
Extra level of indirection to find desired record
Bucket address table may itself become very big (larger than
memory)
Cannot allocate very large contiguous areas on disk either
Solution: B+-tree file organization to store bucket address table
Changing size of bucket address table is an expensive operation
Linear hashing is an alternative mechanism
Allows incremental growth of its directory (equivalent to bucket
address table)
At the cost of more bucket overflows

Database System Concepts - 5th Edition

11.172

Silberschatz, Korth and Sudarshan

Comparison of Ordered Indexing and Hashing


Cost of periodic re-organization
Relative frequency of insertions and deletions
Is it desirable to optimize average access time at the expense of

worst-case access time?

Expected type of queries:

Hashing is generally better at retrieving records having a specified


value of the key.

If range queries are common, ordered indices are to be preferred

In practice:

PostgreSQL supports hash indices, but discourages use due to


poor performance

Oracle supports static hash organization, but not hash indices

SQLServer supports only B+-trees

Database System Concepts - 5th Edition

11.173

Silberschatz, Korth and Sudarshan

Hashing Techniques
Another type of primary file organization is usually

called a hash file.


The search condition must be an equality condition
on a single field, called the hash field of the file.
In most cases, the hash field is also a key field of the
file, in which case it is called the hash key.
The idea behind hashing is to provide a function h,
called a hash function or randomizing function,
that is applied to the hash field value of a record and
yields the address of the disk block in which the
record is stored.
For most records, we need only a single-block
access
Slide 13-174
to retrieve that record. 11.174
Database System Concepts - 5 Edition
Silberschatz, Korth and Sudarshan
th

Internal Hashing (1/6)


For internal files, hashing is typically implemented as

a hash table through the use of an array of records.


Suppose that the array index range is from 0 to M 1; then we have M slots whose addresses
correspond to the array indexes.
We choose a hash function that transforms the hash
field value into an integer between 0 and M - 1.
One common hash function is the h(K) = K mod M
function, which returns the remainder of an integer
hash field value K after division by M; this value is
then used for the record address.

Slide 13-175
Database System Concepts - 5th Edition

11.175

Silberschatz, Korth and Sudarshan

Slide 13-176
Database System Concepts - 5th Edition

11.176

Silberschatz, Korth and Sudarshan

Internal Hashing (2/6)


Other hashing functions can be used.

One technique, called folding, involves applying an


arithmetic function such as addition or a logical function
such as exclusive or to different portions of the hash field
value to calculate the hash address.

Another technique involves picking some digits of the hash


field valuefor example, the third, fifth, and eighth digits
to form the hash address.

Slide 13-177
Database System Concepts - 5th Edition

11.177

Silberschatz, Korth and Sudarshan

Internal Hashing (3/6)


The problem with most hashing functions is that they

do not guarantee that distinct values will hash to


distinct addresses, because the hash field space
the number of possible values a hash field can take
is usually much larger than the address space
the number of available addresses for records.

The hashing function maps the hash field space to

the address space.

Slide 13-178
Database System Concepts - 5th Edition

11.178

Silberschatz, Korth and Sudarshan

Internal Hashing (4/6)


A collision occurs when the hash field value of a

record that is being inserted hashes to an address


that already contains a different record.

In this situation, we must insert the new record in

some other position, since its hash address is


occupied.

The process of finding another position is called

collision resolution.

Slide 13-179
Database System Concepts - 5th Edition

11.179

Silberschatz, Korth and Sudarshan

Internal Hashing (5/6)


There are numerous methods for collision resolution,

including the following:

Open addressing: Proceeding from the occupied position


specified by the hash address, the program checks the
subsequent positions in order until an unused (empty)
position is found.
Chaining: For this method, various overflow locations are
kept, usually by extending the array with a number of
overflow positions. In addition, a pointer field is added to
each record location. A collision is resolved by placing the
new record in an unused overflow location and setting the
pointer of the occupied hash address location to the address
of that overflow location. A linked list of overflow records for
each hash address is thus maintained.
Multiple hashing: The program applies a second hash
function if the first results in a collision. If another collision
13-180 a third
results, the program uses open addressing or Slide
applies
Database System Concepts
- 5 Edition
11.180 open addressing if Silberschatz,
Korth and Sudarshan
hash
function and then uses
necessary.

th

Slide 13-181
Database System Concepts - 5th Edition

11.181

Silberschatz, Korth and Sudarshan

Internal Hashing (6/6)


The goal of a good hashing function is to distribute the

records uniformly over the address space so as to


minimize collisions while not leaving many unused
locations.
Simulation and analysis studies have shown that it is
usually best to keep a hash table between 70 and 90
percent full so that the number of collisions remains low
and we do not waste too much space.
Hence, if we expect to have r records to store in the table,
we should choose M locations for the address space such
that (r/M) is between 0.7 and 0.9.
It may also be useful to choose a prime number for M,
since it has been demonstrated that this distributes the
hash addresses better over the address space when the
mod hashing function is used.
Slide 13-182
Other
require M to be a power
of 2.
Database System
Concepts - 5hash
Edition functions may 11.182
Silberschatz,
Korth and Sudarshan
th

External Hashing for Disk Files


(1/7)

Hashing for disk files is called external hashing.


To suit the characteristics of disk storage, the target

address space is made of buckets, each of which


holds multiple records.
A bucket is either one disk block or a cluster of
contiguous blocks.
The hashing function maps a key into a relative
bucket number, rather than assign an absolute block
address to the bucket.
A table maintained in the file header converts the
bucket number into the corresponding disk block
address.
Slide 13-183
Database System Concepts - 5th Edition

11.183

Silberschatz, Korth and Sudarshan

Slide 13-184
Database System Concepts - 5th Edition

11.184

Silberschatz, Korth and Sudarshan

External Hashing for Disk Files


(2/7)

The collision problem is less severe with buckets,

because as many records as will fit in a bucket can


hash to the same bucket without causing problems.
However, we must make provisions for the case
where a bucket is filled to capacity and a new record
being inserted hashes to that bucket.
We can use a variation of chaining in which a pointer
is maintained in each bucket to a linked list of
overflow records for the bucket.
The pointers in the linked list should be record
pointers, which include both a block address and a
relative record position within the block.

Slide 13-185
Database System Concepts - 5th Edition

11.185

Silberschatz, Korth and Sudarshan

Slide 13-186
Database System Concepts - 5th Edition

11.186

Silberschatz, Korth and Sudarshan

Handling overflow for buckets by


chaining.

Slide 13-187
Database System Concepts - 5th Edition

11.187

Silberschatz, Korth and Sudarshan

Structure of the
extendible
hashing
scheme.

Slide 13-188
Database System Concepts - 5th Edition

11.188

Silberschatz, Korth and Sudarshan

External Hashing for Disk Files


(3/7)

Hashing provides the fastest possible access for

retrieving an arbitrary record given the value of its


hash field.

Although most good hash functions do not maintain

records in order of hash field values, some functions


called order preservingdo.

Slide 13-189
Database System Concepts - 5th Edition

11.189

Silberschatz, Korth and Sudarshan

External Hashing for Disk Files


(4/7)

The hashing scheme described is called static

hashing because a fixed number of buckets M is


allocated.
This can be a serious drawback for dynamic files.

If the number of records turns out to be substantially fewer


than allocated, we are left with a lot of unused space.
On the other hand, if the number of records increases to
substantially more than allocated, numerous collisions will
result and retrieval will be slowed down because of the long
lists of overflow records.

In either case, we may have to change the number

of blocks M allocated and then use a new hashing


function (based on the new value of M) to
redistribute the records.
Slide 13-190

Database System Concepts - 5th Edition

11.190

Silberschatz, Korth and Sudarshan

External Hashing for Disk Files


(5/7)

When using external hashing, searching for a record

given a value of some field other than the hash field


is as expensive as in the case of an unordered file.

Slide 13-191
Database System Concepts - 5th Edition

11.191

Silberschatz, Korth and Sudarshan

External Hashing for Disk Files


(6/7)

Record deletion can be implemented by removing

the record from its bucket.


If the bucket has an overflow chain, we can move
one of the overflow records into the bucket to replace
the deleted record.
If the record to be deleted is already in overflow, we
simply remove it from the linked list.
Notice that removing an overflow record implies that
we should keep track of empty positions in overflow.
This is done easily by maintaining a linked list of
unused overflow locations.

Slide 13-192
Database System Concepts - 5th Edition

11.192

Silberschatz, Korth and Sudarshan

External Hashing for Disk Files


(7/7)

Modifying a records field value depends on two

factors: (1) the search condition to locate the record


and (2) the field to be modified.
If the search condition is an equality comparison on
the hash field, we can locate the record efficiently by
using the hashing function; otherwise, we must do a
linear search.
A nonhash field can be modified by changing the
record and rewriting it in the same bucket.
Modifying the hash field means that the record can
move to another bucket, which requires deletion of
the old record followed by insertion of the modified
record.
Slide 13-193
Database System Concepts - 5th Edition

11.193

Silberschatz, Korth and Sudarshan

Hashing Techniques That Allow


Dynamic File Expansion

Slide 13-194
Database System Concepts - 5th Edition

11.194

Silberschatz, Korth and Sudarshan

Extendible Hashing (1/5)


In extendible hashing, a type of directoryan array

of 2d bucket addressesis maintained, where d is


called the global depth of the directory.

The integer value corresponding to the first (high-

order) d bits of a hash value is used as an index to


the array to determine a directory entry, and the
address in that entry determines the bucket in which
the corresponding records are stored.

Slide 13-195
Database System Concepts - 5th Edition

11.195

Silberschatz, Korth and Sudarshan

Extendible Hashing (2/5)


However, there does not have to be a distinct bucket

for each of the 2d directory locations.

Several directory locations with the same first d bits

for their hash values may contain the same bucket


address if all the records that hash to these locations
fit in a single bucket.

A local depth dstored with each bucket

specifies the number of bits on which the bucket


contents are based.

Slide 13-196
Database System Concepts - 5th Edition

11.196

Silberschatz, Korth and Sudarshan

Slide 13-197
Database System Concepts - 5th Edition

11.197

Silberschatz, Korth and Sudarshan

Extendible Hashing (3/5)


The value of d can be increased or decreased by

one at a time, thus doubling or halving the number of


entries in the directory array.

Doubling is needed if a bucket, whose local depth d

is equal to the global depth d, overflows.

Halving occurs if d > d for all the buckets after some

deletions occur.

Slide 13-198
Database System Concepts - 5th Edition

11.198

Silberschatz, Korth and Sudarshan

Extendible Hashing (4/5)


The main advantage of extendible hashing that

makes it attractive is that the performance of the file


does not degrade as the file grows.
In addition, no space is allocated in extendible
hashing for future growth, but additional buckets can
be allocated dynamically as needed.

The space overhead for the directory table is negligible.


The maximum directory size is 2k, where k is the number of
bits in the hash value.

Another advantage is that splitting causes minor

reorganization in most cases, since only the records


in one bucket are redistributed to the two new
buckets.

The only time a reorganization is more expensive


is when
Slide 13-199
the directory has to be doubled
(or halved).
Database System Concepts - 5 Edition
11.199
Silberschatz, Korth and Sudarshan

th

Extendible Hashing (5/5)


A disadvantage is that the directory must be

searched before accessing the buckets themselves,


resulting in two block accesses instead of one in
static hashing.

This performance penalty is considered minor and

hence the scheme is considered quite desirable for


dynamic files.

Slide 13-200
Database System Concepts - 5th Edition

11.200

Silberschatz, Korth and Sudarshan

Linear Hashing (1/9)


The idea behind linear hashing is to allow a hash file

to expand and shrink its number of buckets


dynamically without needing a directory.

Slide 13-201
Database System Concepts - 5th Edition

11.201

Silberschatz, Korth and Sudarshan

Linear Hashing (2/9)


Suppose that the file starts with M buckets

numbered 0, 1, . . . , M - 1 and uses the mod hash


function h(K) = K mod M; this hash function is called
the initial hash function .

Overflow because of collisions is still needed and

can be handled by maintaining individual overflow


chains for each bucket.

Slide 13-202
Database System Concepts - 5th Edition

11.202

Silberschatz, Korth and Sudarshan

Linear Hashing (3/9)


However, when a collision leads to an overflow

record in any file bucket, the first bucket in the file


bucket 0is split into two buckets: the original
bucket 0 and a new bucket M at the end of the file.
The records originally in bucket 0 are distributed
between the two buckets based on a different
hashing function
hi+1 (K) = K mod 2M.
A key property of the two hash functions hi and hi+1 is

that any records that hashed to bucket 0 based on hi


will hash to either bucket 0 or bucket M based on
hi+1; this is necessary for linear hashing to work.
Slide 13-203

Database System Concepts - 5th Edition

11.203

Silberschatz, Korth and Sudarshan

Linear Hashing (4/9)


As further collisions lead to overflow records,

additional buckets are split in the linear order 1, 2, 3,


....

If enough overflows occur, all the original file buckets

0, 1, . . . , M - 1 will have been split, so the file now


has 2M instead of M buckets, and all buckets use
the hash function.

Hence, the records in overflow are eventually

redistributed into regular buckets, using the function


hi+1 via a delayed split of their buckets.

Slide 13-204
Database System Concepts - 5th Edition

11.204

Silberschatz, Korth and Sudarshan

Linear Hashing (5/9)


There is no directory; only a value nwhich is

initially set to 0 and is incremented by 1 whenever a


split occursis needed to determine which buckets
have been split.

To retrieve a record with hash key value K, first apply

the function hi to K; if hi (K) < n, then apply the


function hi+1 on K because the bucket is already split.

Initially, n = 0, indicating that the function hi applies

to all buckets; n grows linearly as buckets are split.

Slide 13-205
Database System Concepts - 5th Edition

11.205

Silberschatz, Korth and Sudarshan

Linear Hashing (6/9)


When n = M after being incremented, this signifies

that all the original buckets have been split and the
hash function applies to all records in the file.

At this point, n is reset to 0 (zero), and any new

collisions that cause overflow lead to the use of a


new hashing function hi+2 (K) = K mod 4M.

Slide 13-206
Database System Concepts - 5th Edition

11.206

Silberschatz, Korth and Sudarshan

Linear Hashing (7/9)


Splitting can be controlled by monitoring the file load

factor instead of by splitting whenever an overflow


occurs.

In general, the file load factor l can be defined as

l = r/(bfr * N), where r is the current number of file


records, bfr is the maximum number of records that
can fit in a bucket, and N is the current number of file
buckets.

Slide 13-207
Database System Concepts - 5th Edition

11.207

Silberschatz, Korth and Sudarshan

Linear Hashing (8/9)


Buckets that have been split can also be recombined

if the load of the file falls below a certain threshold.

Blocks are combined linearly, and N is decremented

appropriately.

Slide 13-208
Database System Concepts - 5th Edition

11.208

Silberschatz, Korth and Sudarshan

Linear Hashing (9/9)


The file load can be used to trigger both splits and

combinations; in this manner the file load can be


kept within a desired range.

Splits can be triggered when the load exceeds a

certain thresholdsay, 0.9and combinations can


be triggered when the load falls below another
thresholdsay, 0.7.

Slide 13-209
Database System Concepts - 5th Edition

11.209

Silberschatz, Korth and Sudarshan

Bitmap Indices
Bitmap indices are a special type of index designed for efficient

querying on multiple keys

Records in a relation are assumed to be numbered sequentially from,

say, 0

Given a number n it must be easy to retrieve record n

Particularly easy if records are of fixed size

Applicable on attributes that take on a relatively small number of

distinct values

E.g. gender, country, state,

E.g. income-level (income broken up into a small number of levels


such as 0-9999, 10000-19999, 20000-50000, 50000- infinity)

A bitmap is simply an array of bits

Database System Concepts - 5th Edition

11.210

Silberschatz, Korth and Sudarshan

Bitmap Indices (Cont.)


In its simplest form a bitmap index on an attribute has a bitmap for

each value of the attribute

Bitmap has as many bits as records

In a bitmap for value v, the bit for a record is 1 if the record has the
value v for the attribute, and is 0 otherwise

Database System Concepts - 5th Edition

11.211

Silberschatz, Korth and Sudarshan

Bitmap Indices (Cont.)

Bitmap indices are useful for queries on multiple attributes

not particularly useful for single attribute queries

Queries are answered using bitmap operations

Intersection (and)

Union (or)

Complementation (not)

Each operation takes two bitmaps of the same size and applies the
operation on corresponding bits to get the result bitmap

E.g. 100110 AND 110011 = 100010


100110 OR 110011 = 110111
NOT 100110 = 011001

Males with income level L1: 10010 AND 10100 = 10000

Can then retrieve required tuples.

Counting number of matching tuples is even faster

Database System Concepts - 5th Edition

11.212

Silberschatz, Korth and Sudarshan

Bitmap Indices (Cont.)


Bitmap indices generally very small compared with relation size

E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space
used by relation.

If number of distinct attribute values is 8, bitmap is only 1% of


relation size

Deletion needs to be handled properly

Existence bitmap to note if there is a valid record at a record location

Needed for complementation

not(A=v):

(NOT bitmap-A-v) AND ExistenceBitmap

Should keep bitmaps for all values, even null value

To correctly handle SQL null semantics for NOT(A=v):

intersect above result with (NOT bitmap-A-Null)

Database System Concepts - 5th Edition

11.213

Silberschatz, Korth and Sudarshan

Efficient Implementation of Bitmap Operations


Bitmaps are packed into words; a single word and (a basic CPU

instruction) computes and of 32 or 64 bits at once

E.g. 1-million-bit maps can be and-ed with just 31,250 instruction

Counting number of 1s can be done fast by a trick:

Use each byte to index into a precomputed array of 256 elements


each storing the count of 1s in the binary representation

Can use pairs of bytes to speed up further at a higher memory


cost

Add up the retrieved counts

Bitmaps can be used instead of Tuple-ID lists at leaf levels of

B+-trees, for values that have a large number of matching records

Worthwhile if > 1/64 of the records have that value, assuming a


tuple-id is 64 bits

Above technique merges benefits of bitmap and B+-tree indices

Database System Concepts - 5th Edition

11.214

Silberschatz, Korth and Sudarshan

Index Definition in SQL


Create an index

create index <index-name> on <relation-name>


(<attribute-list>)
E.g.: create index b-index on branch(branch_name)
Use create unique index to indirectly specify and enforce the

condition that the search key is a candidate key is a candidate key.

Not really required if SQL unique integrity constraint is supported

To drop an index

drop index <index-name>


Most database systems allow specification of type of index, and

clustering.

Database System Concepts - 5th Edition

11.215

Silberschatz, Korth and Sudarshan

End of Chapter

Database System Concepts, 5th Ed.


Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use

Partitioned Hashing
Hash values are split into segments that depend on each

attribute of the search-key.

(A1, A2, . . . , An) for n attribute search-key


Example: n = 2, for customer, search-key being

(customer-street, customer-city)

search-key value hash value


(Main, Harrison) 101 111
(Main, Brooklyn) 101 001
(Park, Palo Alto) 010 010
(Spring, Brooklyn)001 001
(Alma, Palo Alto) 110 010
To answer equality query on single attribute, need to look up

multiple buckets. Similar in effect to grid files.

Database System Concepts - 5th Edition

11.217

Silberschatz, Korth and Sudarshan

Sequential File For account Records

Database System Concepts - 5th Edition

11.218

Silberschatz, Korth and Sudarshan

Sample account File

Database System Concepts - 5th Edition

11.219

Silberschatz, Korth and Sudarshan

Figure 12.2

Database System Concepts - 5th Edition

11.220

Silberschatz, Korth and Sudarshan

Figure 12.14

Database System Concepts - 5th Edition

11.221

Silberschatz, Korth and Sudarshan

Figure 12.25

Database System Concepts - 5th Edition

11.222

Silberschatz, Korth and Sudarshan

Grid Files
Structure used to speed the processing of general multiple search-

key queries involving one or more comparison operators.

The grid file has a single grid array and one linear scale for each

search-key attribute. The grid array has number of dimensions


equal to number of search-key attributes.

Multiple cells of grid array can point to same bucket


To find the bucket for a search-key value, locate the row and column

of its cell using the linear scales and follow pointer

Database System Concepts - 5th Edition

11.223

Silberschatz, Korth and Sudarshan

Example Grid File for account

Database System Concepts - 5th Edition

11.224

Silberschatz, Korth and Sudarshan

Queries on a Grid File


A grid file on two attributes A and B can handle queries of all following

forms with reasonable efficiency

(a1 A a2)

(b1 B b2)

(a1 A a2 b1 B b2),.

E.g., to answer (a1 A a2 b1 B b2), use linear scales to find

corresponding candidate grid array cells, and look up all the buckets
pointed to from those cells.

Database System Concepts - 5th Edition

11.225

Silberschatz, Korth and Sudarshan

Grid Files (Cont.)


During insertion, if a bucket becomes full, new bucket can be created

if more than one cell points to it.

Idea similar to extendable hashing, but on multiple dimensions

If only one cell points to it, either an overflow bucket must be


created or the grid size must be increased

Linear scales must be chosen to uniformly distribute records across

cells.

Otherwise there will be too many overflow buckets.

Periodic re-organization to increase grid size will help.

But reorganization can be very expensive.

Space overhead of grid array can be high.


R-trees (Chapter 23) are an alternative

Database System Concepts - 5th Edition

11.226

Silberschatz, Korth and Sudarshan

Indexing
Can we do anything else to improve query performance other

than selecting a good file organization?


Yes, the answer lies in indexing
Index - a data structure that allows the DBMS to locate
particular records in a file more quickly

Very similar to the index at the end of a book to locate various


topics covered in the book

Types of Index
Primary index one primary index per file
Clustering index one clustering index per file data file is
ordered on a non-key field and the index file is built on that nonkey field
Secondary index many secondary indexes per file
Sparse index has only some of the search key values in the

file
Dense index has an index corresponding to every search key
value in the file

Database System Concepts - 5th Edition

11.227

227
Dept. of Computing
Science, University of

Silberschatz, Korth and Sudarshan