Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Dynamic files
• dynamic: records are added and deleted from the data set
• undergo a lot of growth
Static hashing
• described in chapter 11 (direct hashing)
• typically worse than B-Tree for dynamic files
• eventually requires file reorganization
Extendible hashing
• Robust, self-adjusting hashing for dynamic file
• Fagin, Nievergelt, Pippenger, and Strong (ACM TODS 1979)
2
Overview(1)
3
Overview(2)
Extendible Hashing
Hashing function
Primary key H(key)
4
How Extendible Hashing works
r andrews
b
baird
5
Formal Definition of Trie
6
Extendible Hashing
d’=2
B11 H(key)=11
8
Turning the Trie into a Directory
9
Representation of Trie (1)
0
00 A
0 1 A
01
1 0 B 10 B
1
C 11 C
10
Representation of Trie(2)
11
Retrieve a Record
12
Splitting to Handle Overflow (1)
00
A
00 A
01 D
01
10 B
10 B
11 C
11 C
13
Splitting to Handle Overflow(2)
00
A
01
10
B
11
C
14
1. Result of overflow of bucket B
A
0
0 B
1 0
1
1 D
C
3. Directory
2. Complete Binary Tree
0 000
0 1 001 A
0 1 0 A 010
1
011
1 B
0 B 100
0 1 D
1
D 101
0
1 C 110 C
111
Another Example
Bucket B100 overflows, then…
d=3
000 d’=2
001 B00 H(key)=00..
010 d’=2
011 B01 H(key)=01..
100
101 d’=3
110 B100 H(key)=100..
111 d’=3
B00 H(key)=101..
d’=2
B00 H(key)=11..
16
Another Example (cont’d)
d=4 d’=2
0000 B00 H(key)=00..
0001
0010 d’=2
0011 B01 H(key)=01..
0100 d’=4
0101 H(key)=1000..
B1000
0110
0111 d’=4 H(key)=1001..
1000 B1001
1001
1010 d’=3
1011 B101 H(key)=101..
1100 d’=2
1101 B11 H(key)=11..
1110
1111 Bucket B100 overflows, d increase to 4
17
Contraction
18
Implementation
19
Creating Address
Function hash(KEY)
• Fold/Add hashing algorithm
• Do not MOD hashing value by address space since no fixed
address space exists
• Output from the hash function for a number of keys
bill 0000 0011 0110 1100
lee 0000 0100 0010 1000
pauline 0000 1111 0110 0101
alan 0100 1100 1010 0010
julie 0010 1110 0000 1001
mike 0000 0111 0100 1101
elizabeth 0010 1100 0110 1010
mark 0000 1010 0000 0111
20
Example of Function Hash (key)
21
Function MakeAddress(key,depth)
23
Main Members of class Bucket
class Directory
{public:
Directory (…..); ~Directory();
int Open (..); int Create(…); int Close();
int Insert(…); int Delete(…); int Search(…);
protected
int DoubleSize();
int Collape();
int InsertBucket (….);
int Find (…);
int StoreBucket(…);
int LoadBucket(…)
…..
}
25
Deletion
26
Buddy Bucket
z = y XOR 1
27
Collapsing the Directory
Collapse condition
• If a single cell, downsizing is impossible
• If there is a pair of directory cells that do not both point to the
same bucket, collapsing is impossible
Allocating space
• Allocate half the size of the original
• Copy the bucket references shared by each cell pair to a single
cell in the new directory
28
Extendible Hashing
Performance
Time : O(1)
• If the directory can kept in RAM: a single access
• Otherwise: two accesses are necessary
Space utilization of the bucket
• r (# of records), b (block size), N (# of Blocks)
• Utilization = r / bN
• Average utilization Æ 0.69
Space utilization for the directory
• How large a directory should we expect to have, given an
expected number of keys?
• Expected value for the directory size by Flajolet(1983)
– Estimated directory size =3.92 / b X r(1+1/b)
29
Space Utilization for Buckets
30
Alternative Approaches(1):
Dynamic Hashing
Similar to dynamic extendible hashing
• Use a directory to track bucket addresses
• Extend the directory through the use of tries
31
Alternative Approaches(1):
Dynamic Hashing (cont’d)
32
Original
(a) 1 2 3 4 address
space
Original
1 2 3 4 address
(b) space
40 41
Original
1 2 3 4 address
(c) space
20 21 1 41
410 411
Dynamic Hashing vs. Extendible
Hashing(1)
Overflow handling
• Both schemes extend the hash function locally, as a binary
search trie
Space utilization
• both schemes is the same (space utilization : 69%)
34
Dynamic Hashing and
Extendible Hashing(2)
Growth of directory
• Dynamic hashing: slower, more gradual growth
• Extendible hashing: extend directory by doubling it
Page fault
• Dynamic hashing: more than one page fault (with linked
structure for the directory)
• Extendible hashing: single page fault
35
Alternative Approaches(2):
Linear Hashing
a b c d a b c d A
00 01 10 11 000 01 10 11 100
(a) (b)
x x
a b c d A B a b c d A B C
00 01 10 11 100 101 00 01 10 11 100 101 110
(c) (d)
(continued...) 37
The growth of address space in
linear hashing(2)
a b c d A B C D
00 01 10 11 100 101 110 111
(e)
38
Approaches to Controlling
Splitting
Postpone splitting: increase space utilization
• B-Tree: redistribution rather than splitting
• Hashing: placing records in chains of overflow buckets to
postpone splitting
39
Approaches to Controlling
Splitting (cont’d)
Postpone splitting for extensible hashing
• Use chaining overflow bucket and avoid doubling directory space
• 1.1 seek, 76% ~ 81% storage utilization
40