Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
searching
Searching Methods
Sequential: data on lists or arrays
O(N) time, may be unacceptably slow
Indexed search:
tree indexing: data in trees
hashing or direct access : data on tables
searching
Important Factors
Ordered or unordered data
Known or unknown data distribution
E.G.M. Petrakis
searching
Unordered Sequences
Lists or arrays of N elements
elements 10
15
Number of comparisons: X = i =1 p i x i
N
E.G.M. Petrakis
searching
1
N +1
xi = (1 + 2 + ... + N) =
N
2
i =1
N
x =
E.G.M. Petrakis
pe + (1 pe )N
2
searching
Other Cases
If p1 >= p2 >= >= pN: move elements
with higher probabilities to the front
element 10
pi
0.2
15
searching
I. Move to Front
Move the element to the front
e.g., if the user searches for 10
15
10
15
becomes:
10
searching
II. Transpositions
The element is shifted one position to
the right
e.g., search(10)
15
10
1
4
9 10 15
Easy for arrays and lists
becomes
E.G.M. Petrakis
searching
Critic
Move to front adapts rapidly to the
search conditions of the application
Transposition adapts slowly but is
more intuitively correct
Combine the two techniques:
use initially move to front and
transposition later
E.G.M. Petrakis
searching
Search techniques:
binary search
interpolation search
indexed sequential search
E.G.M. Petrakis
searching
10
I. Binary Search
d=2
levels
10
N = 20 + 2 1 + 2 2 + ... 2 i = 2 d +1 1 d log( N + 1 )
i =0
d: max number of
comparisons
E.G.M. Petrakis
searching
11
Complexity
Maximum number or comparisons: a
leaf is reached
log2 (N + 1)
Expected number of comparisons:
tree searching stops before a leaf is
reached
N +1
N
log2 (N + 1) 1 log2
N
2
E.G.M. Petrakis
searching
12
II. Interpolation
Searching is guided by the values of
the array
L: minimum value
U: maximum value
x L +1
N
search position h =
U L
searching
13
Example
if x[h] = key element found; else
search array on the left or on the
right of h
e.g.
100
E.G.M. Petrakis
searching
14
Complexity
Average case: O(loglogN) uniform
distribution of keys in the array
Worst case: O(N) on non uniform
distribution
Binary search is O(logN) always!
E.G.M. Petrakis
searching
15
searching
16
array
index
321
592
876
E.G.M. Petrakis
8
14
26
38
72
115
306
321
329
387
409
512
540
567
583
592
602
611
618
741
798
811
814
876
searching
17
array
321
index2
321
485
485
index1
591
591
742
647
591
706
647
742
706
742
E.G.M. Petrakis
searching
18
File Searching
Access a data page, load it in the main
memory and search for the key
unordered files: O(#blocks) disk accesses
ordered files: O(log#blocks) disk accesses
disk head moves back and forth
difficult to control the disk head moves
especially in multi-user environments
leave 20% extra space for insertions
E.G.M. Petrakis
searching
19
Ordered Files
Optimize the performance using an
auxiliary batch file
batch operations in ascending key order
process the operations one after the other
file
new file
transactions
batch a1 <= a2 <= <=aN
not searched
E.G.M. Petrakis
searching
a1
20
ISAM
B-trees
B+-trees,
E.G.M. Petrakis
searching
21
5 8
10 11 16
(38, ) (46, )
23 25 27
28 31 38
42 46
file
E.G.M. Petrakis
searching
22
ISAM Index
cylinder
track
surface
or
block
E.G.M. Petrakis
searching
23
Retrieval
block
search
Key
cylinder
index surface
index
records
E.G.M. Petrakis
searching
24
Overflows
1.
2.
chaining:
overflow pages
distribution of overflow space between
neighboring primary pages
file reorganization necessary soon or later!!
Dependence on hardware!
Pseudo dynamic behavior!
E.G.M. Petrakis
searching
25
Tree Search
The elements are stored in a Binary
Search Tree
10
18
25
15
20
E.G.M. Petrakis
searching
26
Complexity
Average number of key comparisons
or length of path traversed
average case: O(logN) comparisons
worst case: BST is reduced to list and
search is O(N) !!
searching
27
Theorem
N-i-1
>
<
searching
28
Proof
Average number of comparisons
i
N i 1
1
[p (N i 1 ) + 1 ] +
[p(i) + 1 ] +
p (N) =
N
N
N
left sub-tree
right sub-tree
root
P
(i)
=
N
i =0
N 1
E.G.M. Petrakis
P(N i 1) + 1
1 P(i) + 1
i
+
+
(N
1)
=
N
N
i =0
N 1
1
=1 + 2
N
N 1
[iP(i) + (N i 1)P(N i 1) ]
i 0
=
searching
29
Proof (cont.)
because a can be inserted first,
second, n-th element => n cases
N 1
2
N i - 1 i => P(N) = 1 +
iP(i)
2
N i =1
Prove by induction: P(N) <= 1 + 4logN
a more careful analysis shows that the
constant is about 1.4 => P(N) <= 1.4logN
E.G.M. Petrakis
searching
30
Trees
Arrays/Lists
Hashing
Main memory
(Static)
Rehashing
Optimal Trees Unsorted
(move-to-front, Coalesced
transposition)
chaining
Sorted (binary
search)
Main memory
(dynamic mem.
allocation)
BST
AVL
SPLAY
Disk
(static)
Disk
(dynamic mem.
allocation)
E.G.M. Petrakis
Separate
Unsorted
(move-to-front, chaining
transposition)
Files with
overflows
Indexed
sequential Files
(ISAM)
M-trees
B-trees, B+trees (VSAM)
searching
Table
Separate
chaining
Dynamic
Extendible
31
Linear