Sei sulla pagina 1di 68

External Sorting and

Searching

B-Trees, etc.

1
m-Way Search Trees

❚ In a binary search tree, there is one


key value per node and two children.
❚ There is no reason why I couldn’t
have (at most) m-1 key values per
node and m children.
❚ Such trees are called m-way search
trees.

2
m-Way Search Tree
Example

120, 240,

97 200 360, 440

❚ Here is a 3-way search tree; each


node has a maximum of 3 children.
3
m-Way Search Tree Example II

97

120, 240

360, 440

500

❚ Here is another one.


4
m-Way Time Complexity

❚ Clearly, the search and insert time


for an m-way search tree is still O(n).
❙ The number of nodes visited is O(n/m)
❙ For each, we must look at m values.
❙ We could search in O(log2(m)) time,
yielding a best case of O(n/m * log2(m)).
❙ Of course, as n gets much larger than
M, this is still O(n).

5
B-Trees

❚ What I want is a height-balanced m-


way search tree to achieve the best
search time.
❚ These are called B-Trees.
❚ As with height-balanced BSTs, we
will have a re-balancing algorithm to
run after every insert and delete.

6
B-Tree Properties

❚ The root may have between 2 and m


children.
❚ All other nodes must have between 
M/2 and m children.
❚ A node that has k children will have
k-1 key values.
❚ Thus, the root may have only 2
children; all other nodes must be at
least half full. 7
B-Tree Properties II

❚ If a B-Tree has k children (T0, T1, ...TK-1)


and k-1 ordered key values (D1,
D2,...DK-1), then all the key values in Ti
are greater than Di but less than Di+1
for i=1...k-2.
❚ All the key values in T0 are less than
D 1.
❚ All the key values in Tk-1 are greater
8
B-Tree Insertion

❚ All insertions are done at the


terminal level.
❚ First search for terminal level node
to insert the new key value into.
❚ If the number of children of this node
does not exceed m, stop.
❚ If the number of children does
exceed m...
9
B-Tree Node Splitting

❚ Split this node into two nodes:


❙ Take the middle value out.
❙ Create one node with the lower half of
the key values and one with the upper
half.
❙ Insert middle value into the parent
node.
❙ Continue recursively until either the
node can hold the new key value, or you
split the root. 10
B-Tree Insert Example

❚ A B-Tree of order 3 (i.e. m=3) is the


smallest possible.
❚ It is also the easiest to draw, so we’ll
use this order for our example.
❚ This is also called a “2-3 Tree”
because each node may have a
maximum of 2 key values and 3
children.
11
B-Tree Example
Key values left to insert: 360, 240, 200, 97, 440, 280

120

❚ Insert 120. A new root node is


created and this value is placed 12
B-Tree Example
Key values left to insert:240, 200, 97, 440, 280

120, 360

❚ Insert 360. It goes into the root.


No further action is required. 13
B-Tree Example
Key values left to insert: 200, 97, 440, 280

120, 240, 360

❚ Insert 240. It goes into the root.


Since this node has 3 values, it
must be split. 14
B-Tree Example
Key values left to insert: 200, 97, 440, 280

240

120 360

❚ This shows the result of the split.


120 and 360 go into nodes by
themselves, and 240 is placed
into a new root node. 15
B-Tree Example
Key values left to insert: 97, 440, 280

240

120, 200 360

❚ Insert value 200. It goes into the


node with 120. No further action
is required. 16
B-Tree Example
Key values left to insert: 440, 280

240

97, 120, 200 360

❚ Insert value 97. It goes into the


node with 120 and 200. Since
this node contains too many
values, it must be split 17
B-Tree Example
Key values left to insert: 440, 280

120, 240,

97 200 360

❚ This shows the result of the split.


97 and 200 are placed into their
own nodes, and 120 is moved up
to the parent. The parent node is 18
B-Tree Example
Key values left to insert:280

120, 240,

97 200 360, 440

❚ Insert 440. It goes into the node


with 360. No further action is 19
B-Tree Example
Key values left to insert:DONE

120, 240,

97 200 280, 360, 440

❚ Insert the value 280. It goes into


the node with 360 and 440. Since
this node has 3 values, it must be
20
B-Tree Example

120, 240, 360

97 200 280 440

❚ This shows the result of the split.


280 and 440 go into nodes by
themselves, and 360 is moved up
to the parent node. 21
B-Tree Example
240

120 360

97 200 280 440

❚ The parent node must be split as


well. Because it is the root, we
must create a new root node. 22
Time Complexity

❚ What is the order of a B-tree search?


To answer this, we need to determine
the worst case number of levels in a
B-Tree of order m that has n key
values.
❚ Let’s look at the number of nodes per
level:
❙ The root must have 1 node;
❙ Level 2 must have 2 nodes;
❙ Level 3 must have 2* M/2 nodes; 23
Time Complexity II

❚ Observation: in any list of n


elements, there are n+1 ways for
the search to fail.
❚ In a B-tree, all the ways to fail are at
level L+1 (these are sometimes
called Failure Nodes).
❚ Thus, this is a relationship between
the number of key values and the
height of the tree: 24
Time Complexity III

❚ Because the previous analysis is a


worst case, the number of nodes at
level L+1 must be less than or equal
to N+1:

❚ 2 * m/2 L-1 <= (N+1)


❚ m/2 L-1 <= (N+1)/2
❚ L-1 <= Log m/2 [(N+1)/2]
❚ L <= Log m/2 [(N+1)/2] + 1
25
Time Complexity IV

❚ One node at each level must be


accessed, so L gives the number of
nodes to access.
❚ Each node contains m/2 -1 key
values, so the total number of
comparisons is

❚ {Log m/2 [(N+1)/2]+1} *


{Log2[m/2 26
Fun With Math

❚ Removing the constants, we may say


this search is
❚ O{ Log m/2 (N) * Log2[m/2] }
❚ O{Log2(N) / Log2m/2 * (Log2[m/2
) }
❚ O{Log2(N)}

27
Summing it up:

❚ WHAT??? ALL THIS


WORK FOR THE SAME
ORDER AS AN AVL-
TREE!!!

❚ What’s going on here???


28
What Really Happens

❚ Remember this is external sorting,


so accessing the information and
doing comparisons are a much
different cost.
❚ Each node in the B-tree is stored in a
“block” on the disk; a “block” is the
minimum amount of information
which can be retrieved with one disk
access.
29
What Really Happens II

❚ Thus, the number of disk accesses is


the bottle-neck; this is given by L.
❚ A B-tree is built on a field of a data
file to speed access to that field.
❚ A “Clustered” or “Primary” B-tree
stores the entire record of the file in
the B-Tree.
❚ An “Unclustered” or “Secondary” B-
tree stores the field’s value and the 30
What Really Happens III

❚ It is the secondary B-trees that one


usually means when one says “B-
tree”.
❚ Thus, to do a search for a record on
a field which has a B-tree:
❙ Search the B-tree for the key value.
❙ When found, retrieve its associated
record number.
❙ Retrieve that record from the data file.
31
A Real Example.

❚ What follows is a real example of


how a B-tree is used.

32
Sample Data File

Course Teacher Schedule#


CS 470 Prof. Green 23
CS 471 Prof. Green 45
CS 472 Prof. Green 46
CS 473 Prof. Smith 100
CS 474 Prof. Smith 110
CS 475 Prof. Smith 120
CS 476 Prof. Green 140
CS 477 Prof. Green 210

33
B-Tree on Schedule#

❚ This is the way we would normally


view it: 100

45 120

23 46 110 140,210

34
B-Tree on Schedule#

❚ This is how it really looks in a file :


Child Key Data Child Key Data Child
Rec# Ptr 1 value 1 Ptr 1 Ptr 2 value 2 Ptr 2 Ptr 3
1 2 100 4 6 0 0 0
2 3 45 2 4 0 0 0
3 0 23 1 0 0 0 0
4 0 46 3 0 0 0 0
5 0 110 5 0 0 0 0
6 5 120 6 7 0 0 0
7 0 140 7 0 210 8 0

35
Deleting in a B-tree

❚ To delete from a B-Tree, first locate


the key value with the normal search
routine.
❚ If the key value is not located in a
terminal node, replace it with its in
order successor and delete the in
order successor.
❚ Thus, all deletes which reduce the
number of key values occur at the 36
Deleting From the Terminal
Level

❚ Good news: because there are no


children to worry about, we can just
remove it from the list.
❚ Bad news: what if this removal
reduces the number of children
below m/2 ?
❚ Reality: at some point we will need
to reduce the number of nodes...
37
The “Borrow” Algorithm

❚ When a node is reduced below m/2


children, first try and borrow a key
value from one of its neighbors.
❚ If a neighbor has more than the
minimum, then rotate the
appropriate key to the parent and
the appropriate key from the parent
down to the reduced child.
38
Borrow Example

120, 240

97 200 360, 440

❚ Suppose I want to delete 200 from


this b-tree of order 3.
❚ To do so, rotate 240 into middle
child, and 360 up to root:
39
Borrow Example

120, 360

97 240 440

❚ This shows the result.


❚ Problem: what if I now want to delete
240?
❚ Borrowing won’t work...
40
Combining Nodes

❚ When borrowing won’t work,


combine the node with the key value
from the parent AND the neighbor
node with minimum children.
❚ Repeat the deletion algorithm from
the parent, looking first to borrow if
possible.

❚ Now, let’s delete 240...


41
Combining Example

120, 360

97 240 440

❚ First, remove 240.

42
Combining Example

120, 360

97 <empty> 440

❚ Next, attempt to borrow.


❚ Borrowing fails.
❚ Combine empty node with 360 and
440.
43
Combining Example

120

97 360, 440

❚ This shows the result.


❚ The parent is OK, so we are done...

44
A Larger Example

260

120, 180 360

97 150 200 280 440, 500

❚ Delete 280
❚ This is a “borrow” case: 45
A Larger Example

260

120, 180 440

97 150 200 360 500

❚ Delete 360
❚ This is a “combine” case: 46
A Larger Example

260

120, 180 440

97 150 200 <empty> 500

❚ First, remove 360...


47
A Larger Example

260

120, 180 440

97 150 200 <empty> 500

❚ Next combine node with its neighbor


(500) and 440 from the parent...
48
A Larger Example

260

120, 180 <empty>

97 150 200 440, 500

❚ Parent now has a problem...


❚ This is a borrow case: 49
A Larger Example

180

120 260

97 150 200 440, 500

❚ Children must now be considered.


What do I do with the node with 200?
50
A Larger Example

180

120 260

97 150 200 440, 500

❚ Link it under 260.


❚ Now, delete 97... 51
A Larger Example

180

120 260

<empty> 150 200 440, 500

❚ This is a combine case, so bring 120


down and combine with 150...
52
A Larger Example

180

<empty> 260

120, 150 200 440, 500

❚ The parent now has a problem.


❚ This is a combine case: 53
A Larger Example

<empty>

180, 260

120, 150 200 440, 500

❚ The old root is now empty; what to


do with it?
54
A Larger Example

180, 260

120, 150 200 440, 500

❚ Just dispose of it properly.


55
Red-black Trees

❚ Consider a b-tree of order 4.


❙ A node must have at least 2 children
and as many as 4.
❙ A node must have at least 1 key value
and as many as 3.
❚ We have always represented the key
values as an array, but what if we
did it as a tree?

56
Red-black Trees Example

180, 260, 550

120, 150 200 440, 500 600

❚ This is a valid b-tree of order 4; Now store


key values in a binary search tree:
57
Red-black Trees Example
260
180 550

150 200 440 600


120 500

❚ OK, now link up b-tree node pointers to


create a binary search tree:
58
Red-black Trees Example
260
180 550

150 200 440 600


120 500

❚ Now color inter-node links black and intra-


nodes red:
59
Red-black Trees Example
260
180 550

150 200 440 600


120 500

❚ Color each node the color of the edge


incident on it:
60
Red-black Trees Example
260
180 550

150 200 440 600


120 500

❚ This is a Red-black Tree. It is a height-


balanced binary search tree.
61
Red-black Properties

❚ A red node must have only black


nodes as children.
❚ A black node may have either red or
black nodes as children.
❚ The path from the root to any
terminal level node must pass
through the same number of black
nodes.
62
Red-black Insert

❚ The insert algorithm follows the rules for


b-trees, but defines it in terms of node
color:
❙ Search for place to insert; All new insertions
go in as red nodes. (E.G. All insertions go into
an existing b-tree node).
❙ If parent of new node is black, stop. (E.G. If the
b-tree node is not full, no problem).
❙ If parent is red, see if a simple AVL-type
rotation will work: look at grandparent as root
of rotation.
63
Red-black Insert II

❙ If rotation doesn’t work, move the


nearest black ancestor to its parent by
making it red and both of its children
black. (e.g. split the B-tree node & move
middle key to parent).
❙ Repeat for newly colored red node.
(e.g. repeat for parent B-tree node).

64
Red-black Delete

❚ Same basic idea:


❙ Find key to delete;
❙ If it is not at the terminal level, replace
with its in order successor & delete this
value.
❙ Thus, all deletions which reduce the
number of nodes occur at the terminal
level of the B-tree.
❙ The rules follow those for deleting from
a B-tree: 65
Red-black Delete II

❚ If the node is red, do your standard


BST delete (e.g. the B-tree node is not
empty).
❚ If the node is black, but has a red
child, do your standard BST delete &
make the red child black (e.g. again,
B-tree node is not empty).
❚ If the node is black and has no
children… 66
Red-black Delete III

❚ …Attempt to “borrow” from parent’s


other child.
❚ …Failing that, “combine” nodes and
repeat at (B-tree) parent.

67
The End Slide

68

Potrebbero piacerti anche