Sei sulla pagina 1di 29

Outline Elementary Data Structures Hash Tables Binary Search Trees

Chapter 3
Data Structures

Yonas Y.
Algorithm Analysis and Design
School of Electrical and Computer Engineering

1
Outline Elementary Data Structures Hash Tables Binary Search Trees

1 Elementary Data Structures


Stacks and Queues
Linked lists

2 Hash Tables
Direct-address tables
Hash tables
Hash functions

3 Binary Search Trees


Binary Search Trees
Insertion and deletion

2
Outline Elementary Data Structures Hash Tables Binary Search Trees

Stacks and Queues

Stacks and Queues


Stacks
Stack implements a last-in, first-out, or LIFO, policy.
The INSERT operation on a stack is often called PUSH, and the DELETE
operation, which does not take an element argument, is often called POP.
Figure 3.1 shows, we can implement a stack of at most n elements with an
array S[1..n].
The array has an attribute S.top that indexes the most recently inserted
element.
It consists of elements S[1..S.top], where S[1] is the element at the
bottom of the stack and S[S.top] is the element at the top.
When S.top = 0, the stack contains no elements and is empty.

Figure : 3.1 An array implementation of a stack S. Stack elements appear only in the
lightly shaded positions.

3
Outline Elementary Data Structures Hash Tables Binary Search Trees

Stacks and Queues

We can implement stack operations with just a few lines of code:


STACK-EMPTY(S)
1 if S.top == 0
2 return TRUE
3 else return FALSE

PUSH(S,x)
1 S.top = S.top + 1
2 S[S.top] = x

POP(S)
1 if STACK-EMPTY(S)
2 error ”underflow”
3 else S.top = S.top − 1
4 return S[S.top + 1]
Each of the three stack operations takes O(1) time.

4
Outline Elementary Data Structures Hash Tables Binary Search Trees

Stacks and Queues


Queues
In a queue, the element deleted is always the one that has been in the set for
the longest time: the queue implements a first-in, first-out, or FIFO, policy.
The queue has a head and a tail.
We call the INSERT operation on a queue ENQUEUE, and we call the
DELETE operation DEQUEUE.
Figure 3.2 shows one way to implement a queue of at most n − 1 elements
using an array Q[1..n].
The queue has an attribute Q.head that indexes, or points to, its head.
The attribute Q.tail indexes the next location at which a newly arriving
element will be inserted into the queue.
When Q.head = Q.tail, the queue is empty.

Figure : 3.2 A queue implemented using an array Q[1..12]. Queue elements appear
only in the lightly shaded positions. 5
Outline Elementary Data Structures Hash Tables Binary Search Trees

Stacks and Queues

The procedures of ENQUEUE and DEQUEUE are:


ENQUEUE(Q,x)
1 Q[Q.tail] = x
2 if Q.tail == Q.length
3 Q.tail = 1
4 else Q.tail = Q.tail + 1

DEQUEUE(Q)
1 x = Q[Q.head]
2 if Q.head == Q.length
3 Q.head = 1
4 else Q.head = Q.head + 1
5 return x
Each operation takes O(1) time.

6
Outline Elementary Data Structures Hash Tables Binary Search Trees

Linked lists

Linked lists
A linked list is a data structure in which the objects are arranged in a linear
order.
Unlike an array, however, in which the linear order is determined by the
array indices, the order in a linked list is determined by a pointer in each
object.
As shown in Figure 3.3, each element of a doubly linked list L is an object
with an attribute key and two other pointer attributes: next and prev.
Given an element x in the list, x.next points to its successor in the linked
list, and x.prev points to its predecessor.
If x.prev = NIL, the element x has no predecessor and is therefore the first
element, or head, of the list.
If x.next = NIL, the element x has no successor and is therefore the last
element, or tail, of the list.

Figure : 3.3 A doubly linked list L representing the dynamic set. 7


Outline Elementary Data Structures Hash Tables Binary Search Trees

Linked lists

A list may have one of several forms.


either singly linked or doubly linked,
may be sorted or unsorted, and
may be circular or not.

Searching a linked list


The procedure LIST-SEARCH(L, k) finds the first element with key k in list L
by a simple linear search, returning a pointer to this element.
If no object with key k appears in the list, then the procedure returns NIL.

LIST-SEARCH(L, k)
1 x = L.head
2 while x 6= NIL and x.key 6= k
3 x = x.next
4 return x
The LIST-SEARCH procedure takes Θ(n) time in the worst case.

8
Outline Elementary Data Structures Hash Tables Binary Search Trees

Linked lists

Inserting into a linked list


Given an element x whose key attribute has already been set, the
LIST-INSERT procedure ”splices” x onto the front of the linked list, as shown
in Figure 3.3(b).
LIST-INSERT(L, x)
1 x.next = L.head
2 if L.head 6= NIL
3 L.head.prev = x
4 L.head = x
5 x.prev = NIL
The running time for LIST-INSERT on a list of n elements is O(1).

Deleting from a linked list


The procedure LIST-DELETE removes an element x from a linked list L.
It must be given a pointer to x, and it then ”splices” x out of the list by
updating pointers.

9
Outline Elementary Data Structures Hash Tables Binary Search Trees

Linked lists

LIST-DELETE(L, x)
1 if x.prev 6= NIL
2 x.prev .next = x.next
3 else L.head = x.next
4 if x.next 6= NIL
5 x.next.prev = x.prev
LIST-DELETE runs in O(1) time.
If we wish to delete an element with a given key, Θ(n) time is required in
the worst case because we must first call LIST-SEARCH to find the
element.

10
Outline Elementary Data Structures Hash Tables Binary Search Trees

Direct-address tables

Hash Tables:
Direct-address tables
Direct addressing is a simple technique that works well when the universe U of
keys is reasonably small.
Maintain a dynamic set.
Each element has a key drawn from a universe U = {0, 1, ..., m − 1}.
No two elements have the same key.
To represent the dynamic set, use a direct-address table, or array,
T [0...m − 1]:
Each slot, or position, corresponds to a key in U.
If there’s an element x with key k, then T [k] contains a pointer to x.
Otherwise, T [k] is empty, represented by NIL.

Figure : 3.4 How to implement a dynamic set by a direct-address table T. 11


Outline Elementary Data Structures Hash Tables Binary Search Trees

Hash tables

The dictionary operations are trivial to implement:


DIRECT-ADDRESS-SEARCH(T, k)
1 return T [k]
DIRECT-ADDRESS-INSERT(T, x)
1 T [x.key ] = x
DIRECT-ADDRESS-DELETE(T, x)
1 T [x.key ] = NIL
Each of these operations takes only O(1) time.
Hash tables
The problem with direct addressing is if the universe U is large, storing a table
of size |U| may be impractical or impossible.
Often, the set K of keys actually stored is small, compared to U, so that most
of the space allocated for T is wasted.
When K is much smaller than U, a hash table requires much less space
than a direct-address table.
Can reduce storage requirements to Θ(|K |).
Can still get O(1) search time, in the average case, not the worst case.
12
Outline Elementary Data Structures Hash Tables Binary Search Trees

Hash tables

Idea: Instead of storing an element with key k in slot k, use a function h and
store the element in slot h(k).
We call h a hash function.
h : U → 0, 1, ..., m − 1, so that h(k) is a legal slot number in T .
We say that k hashes to slot h(k).

Figure : 3.5 Using a hash function h to map keys to hash-table slots.


Collisions: When two or more keys hash to the same slot.
Can happen when there are more possible keys than slots (|U| > m).
For a given set K of keys with |K | ≤ m, may or may not happen.
Definitely happens if |K | > m.
Therefore, must be prepared to handle collisions in all cases.
Use two methods: chaining and open addressing1 .
1
Reading Assignment 13
Outline Elementary Data Structures Hash Tables Binary Search Trees

Hash tables

Collision resolution by chaining


Put all elements that hash to the same slot into a linked list.
Slot j contains a pointer to the head of the list of all stored elements that
hash to j,
If there are no such elements, slot j contains NIL.

Figure : 3.6 Collision resolution by chaining.

How to implement dictionary operations with chaining:


Insertion:
CHAINED-HASH-INSERT(T, x)
1 insert x at the head of list T [h(x.key )]
Worst-case running time is O(1).
Assumes that the element being inserted isn’t already in the list.
14
Outline Elementary Data Structures Hash Tables Binary Search Trees

Hash tables

Search:
CHAINED-HASH-SEARCH(T, k)
1 search for an element with key k in list T [h(k)]
Running time is proportional to the length of the list of elements in slot
h(k).

Deletion:
CHAINED-HASH-DELETE(T, x)
1 delete x from the list T [h(key [x])]
Given pointer x to the element to delete, so no search is needed to find
this element.
Worst-case running time is O(1) time if the lists are doubly linked.
If the lists are singly linked, then deletion takes as long as searching,
because we must find x 0 s predecessor in its list in order to correctly update
next pointers.

15
Outline Elementary Data Structures Hash Tables Binary Search Trees

Hash tables

Analysis of hashing with chaining


Given a key, how long does it take to find an element with that key, or to
determine that there is no element with that key?
Analysis is in terms of the load factor α = n/m:
n = # of elements in the table.
m = # of slots in the table = # of (possibly empty) linked lists.
Load factor is average number of elements per linked list.
Can have α < 1, α = 1, or α > 1.
Worst case is when all n keys hash to the same slot.
Will result in a single list of length n and worst-case time to search is
Θ(n), plus time to compute hash function.
Average case depends on how well the hash function distributes the keys
among the slots.
Assume simple uniform hashing: any given element is equally likely to hash
into any of the m slots.
For j = 0, 1, ..., m − 1, denote the length of list T [j] by nj . Then
n = n0 + n1 + + nm−1 .
Average value of nj is E [nj ] = α = n/m.
Assuming that we can compute the hash function in O(1) time, so that
the time required to search for the element with key k depends on the
length nh(k) of the list T [h(k)]. 16
Outline Elementary Data Structures Hash Tables Binary Search Trees

Hash functions

Hash functions

What makes a good hash function?


Ideally, the hash function satisfies the assumption of simple uniform
hashing.
In practice, it’s not possible to satisfy this assumption, since we don’t
know in advance the probability distribution that keys are drawn from, and
the keys may not be drawn independently.
Often use heuristics, based on the domain of the keys, to create a hash
function that performs well.

Keys as natural numbers


Hash functions assume that the keys are natural numbers.
When they’re not, have to interpret them as natural numbers.
Example: Interpret a character string as an integer expressed in some
radix notation. Suppose the string is CLRS:
ASCII values: C = 67, L = 76, R = 82, S = 83.
There are 128 basic ASCII values.
So interpret CLRS as
(67 · 1283 ) + (76 · 1282 ) + (82 · 1281 ) + (83 · 1280 ) = 141, 764, 947.

17
Outline Elementary Data Structures Hash Tables Binary Search Trees

Hash functions

Division method
In the division method for creating hash functions, we map a key k into one of
m slots by taking the remainder of k divided by m.
h(k) = k mod m.
Example: m = 20 and k = 91 ⇒ h(k) = 11.
Advantage: Fast, since requires just one division operation.
Disadvantage: Have to avoid certain values of m:
Powers of 2 are bad. If m = 2p for integer p, then h(k) is just the least
significant p bits of k.
If k is a character string interpreted in radix 2p (as in CLRS example),
then m = 2p − 1 is bad.
Good choice for m: A prime not too close to an exact power of 2.
Multiplication method
The multiplication method for creating hash functions operates in the ff steps.
1 Choose constant A in the range 0 < A < 1.

2 Multiply key k by A.

3 Extract the fractional part of kA.

4 Multiply the fractional part by m.

5 Take the floor of the result.

Put another way, h(k) = bm(kA mod 1)c, where kA mod 1 = kA - bkAc =
fractional part of kA. 18
Outline Elementary Data Structures Hash Tables Binary Search Trees

Hash functions

Disadvantage: Slower than division method.


Advantage: Value of m is not critical.
(Relatively) easy implementation:
Choose m = 2p for some integer p.
Let the word size of the machine be w bits.
Assume that k fits into a single word. (k takes w bits.)
Let s be an integer in the range 0 < s < 2w . (s takes w bits.)
Restrict A to be of the form s/2w .

Figure : 3.7 The multiplication method of hashing.


Example: m = 8 (implies p = 3), w = 5, k = 21.
have 0 < s < 25; choose s = 13 ⇒ A = 13/32. (Knuth suggests using
Must √
A ≈ ( 5 − 1)/2).
Using the implementation: ks = 21 · 13 = 273 = 8 · 25 + 17 ⇒
r1 = 8, r0 = 17.
Written in w = 5 bits, r0 = 10001. Take the p = 3 most significant bits of
r0 , get 100 in binary, or 4 in decimal, so that h(k) = 4. 19
Outline Elementary Data Structures Hash Tables Binary Search Trees

Binary Search Trees

Binary Search Trees


Binary search trees are an important data structure for dynamic sets.
Accomplish many dynamic-set operations in O(h) time, where h = height
of tree.
We represent a binary tree by a linked data structure in which each node is
an object.
root[T ] points to the root of tree T .
Each node contains the fields:
key (and possibly other satellite data).
left: points to left child.
right: points to right child.
p: points to parent. p[root[T ]] = NIL.
Stored keys must satisfy the binary-search-tree property.
If y is in left subtree of x, then key [y ] ≤ key [x].
If y is in right subtree of x, then key [y ] ≥ key [x].
Figure 3.8 shows a simple binary tree.

Figure : 3.8 Binary search trees. For any node x, the keys in the left subtree of x are
20
at most x.key , and the keys in the right subtree of x are at least x.key .
Outline Elementary Data Structures Hash Tables Binary Search Trees

Binary Search Trees

The binary-search-tree property allows us to print keys in a binary search tree in


order, recursively, using an algorithm called an inorder tree walk. Elements are
printed in monotonically increasing order.
How INORDER-TREE-WALK works:
Check to make sure that x is not NIL.
Recursively, print the keys of the nodes in x 0 s left subtree.
Print x 0 s key.
Recursively, print the keys of the nodes in x 0 s right subtree.

INORDER-TREE-WALK(x)
1 6 NIL
if x =
2 INORDER-TREE-WALK(x.left)
3 print x.key
4 INORDER-TREE-WALK(x.right)
Example: Do the inorder tree walk on figure 3.8, getting the output ABDFHK.
Correctness: Follows by induction directly from the binary-search-tree property.
Time: Intuitively, the walk takes Θ(n) time for a tree with n nodes, because we
visit and print each node once.
21
Outline Elementary Data Structures Hash Tables Binary Search Trees

Binary Search Trees

Querying a binary search tree


Besides the SEARCH operation, binary search trees can support such queries as
MINIMUM, MAXIMUM, SUCCESSOR, and PREDECESSOR.
Searching
TREE-SEARCH(x, k)
1 if x == NIL or k == x.key
2 return x
3 if x < x.key
4 return TREE-SEARCH(x.left, k)
5 else return TREE-SEARCH(x.right, k)
Initial call is TREE-SEARCH(root[T ], k).
Time: The algorithm recurses, visiting nodes on a downward path from the
root. Thus, running time is O(h), where h is the height of the tree.

Minimum and maximum


The binary-search-tree property guarantees that
the minimum key of a binary search tree is located at the leftmost node,
and
the maximum key of a binary search tree is located at the rightmost node.
Traverse the appropriate pointers (left or right) until NIL is reached. 22
Outline Elementary Data Structures Hash Tables Binary Search Trees

Binary Search Trees

TREE-MINIMUM(x)
1 while x.left 6= NIL
2 x = x.left
3 return x
The pseudocode for TREE-MAXIMUM is symmetric.
Time: Both procedures visit nodes that form a downward path from the root
to a leaf. Both procedures run in O(h) time, where h is the height of the tree.

Successor and predecessor


Assuming that all keys are distinct, the successor of a node x is the node y
such that key [y ] is the smallest key > key [x].
If x has the largest key in the binary search tree, then we say that x 0 s
successor is NIL.
There are two cases:
1 If node x has a non-empty right subtree, then x 0 s successor is the
minimum in x 0 s right subtree.
2 If node x has an empty right subtree, notice that:
As long as we move to the left up the tree (move up through right
children), we’re visiting smaller keys.
x 0 s successor y is the node that x is the predecessor of (x is the maximum
in y 0 s left subtree).
23
Outline Elementary Data Structures Hash Tables Binary Search Trees

Binary Search Trees

TREE-SUCCESSOR(x)
1 if x.right 6= NIL
2 return TREE-MINIMUM(x.right)
3 y = x.p
4 while y 6= NIL and x == y .right
5 x =y
6 y = y .p
7 return y
TREE-PREDECESSOR is symmetric to TREE-SUCCESSOR.
Time: For both the TREE-SUCCESSOR and TREE-PREDECESSOR
procedures, in both cases, we visit nodes on a path down the tree or up the
tree. Thus, running time is O(h), where h is the height of the tree.

24
Outline Elementary Data Structures Hash Tables Binary Search Trees

Binary Search Trees

Example:

Figure : 3.9 Queries on a binary search tree.

Find the successor of the node with key value 15. (Answer: Key value 17)
Find the successor of the node with key value 6. (Answer: Key value 7)
Find the successor of the node with key value 4. (Answer: Key value 6)
Find the predecessor of the node with key value 6. (Answer: Key value 4)

25
Outline Elementary Data Structures Hash Tables Binary Search Trees

Insertion and deletion

Insertion and deletion


Insertion and deletion allows the dynamic set represented by a binary
search tree to change.
The binary-search-tree property must hold after the change.
Insertion is more straightforward than deletion.
Insertion
To insert a new value v into a binary search tree T, we use the ff procedure.
TREE-INSERT(T, z)
1 y = NIL
2 x = T .root
3 while x 6= NIL
4 y =x
5 if z.key < x.key
6 x = x.left
7 else x = x.right
8 z.p = y
9 if y == NIL
10 T .root = z //tree T was empty
11 elseif z.key < y .key
12 y .left = z
13 else y .right = z 26
Outline Elementary Data Structures Hash Tables Binary Search Trees

Insertion and deletion

To insert value v into the binary search tree, the procedure is given node
z, with key [z] = v , left[z] = NIL , and right[z] = NIL.
Beginning at root of the tree, trace a downward path, maintaining two
pointers.
Pointer x: traces the downward path.
Pointer y: ”trailing pointer” to keep track of parent of x.
Traverse the tree downward by comparing the value of node at x with v,
and move to the left or right child accordingly.
When x is NIL , it is at the correct position for node z.
Compare z’s value with y’s value, and insert z at either y’s left or right,
appropriately.

Figure : 3.10 Inserting an item with key 13 into a binary search tree.

Time: Same as TREE-SEARCH. On a tree of height h, this procedure takes


O(h) time. 27
Outline Elementary Data Structures Hash Tables Binary Search Trees

Insertion and deletion

Deletion
TREE-DELETE is broken into three cases.
Case 1: z has no children.
Delete z by making the parent of z point to NIL, instead of to z.
Case 2: z has one child.
Delete z by making the parent of z point to zs child, instead of to z.
Case 3: z has two children.
z’s successor y has either no children or one child. (y is the minimum
node-with no left child-in z’s right subtree.)
Delete y from the tree (via Case 1 or 2).
Replace z’s key and satellite data with y’s.
Time: O(h), on a tree of height h.

28
Outline Elementary Data Structures Hash Tables Binary Search Trees

Insertion and deletion

Figure : 3.11 Deleting a node from a binary search tree. 29

Potrebbero piacerti anche