Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data Structure
- Niraj Agarwal
Data Structures
run correctly
run efficiently
be easy to read and understand
be easy to debug and
be easy to modify.
Collections
add
delete
find
destroy
Analyzing an Algorithm
Simple loops
for(i=0;i<n;i++) { s; }
where s is O(1)
{ s; h = 2 * h;}
Complexity O(log n)
for(j=0;j<n;j++)
{ s; }
Arrays
An Array is the simplest form of implementing a collection
Each object in an array is called an array element
Each element has the same data type (although they may have different
values)
Individual elements are accessed by index using a consecutive range of
integers
One Dimensional Array or vector
int A[10];
for ( i = 0; i < 10; i++)
A[i] = i +1;
A[0]
1
A[1]
2
A[2]
3
A[n-2]
A[n-1]
N-1
Arrays (Cont.)
Multi-dimensional Array
A multi-dimensional array of dimension n (i.e., an n-dimensional array or simply nD array) is a collection of items which is accessed via n subscript expressions. For
example, in a language that supports it, the (i,j) th element of the two-dimensional
array x is accessed by writing x[i,j].
R
O
W
0
1
2
:
i
m
Column
6
7
8
10
:
x
Arrays (Cont.)
Array : Limitations
If you want to insert/ remove an element to/ from a fixed position in the
list, then you must move elements already in the list to make room for
In the worst case, inserting into position 1 requires to move all the
elements.
array of the appropriate size and copy the old array to the new array
Linked Lists
The linked list is a very flexible dynamic data structure:
items may be added to it or deleted from it at will
Dynamically allocate space for each element as needed
Include a pointer to the next item
the number of items that may be added to a list is limited only by the
amount of memory available
The last node in the list contains a NULL pointer to indicate that it is the
end or tail of the list.
Data
Next
object
Collection
Head
node
Data
Next
object
Tail
Collection
Head
node
Data
node
Next
object2
Data
Next
object
};
int AddToCollection( Collection c, void *item ) {
Node new = malloc( sizeof( struct t_node ) );
new->item = item;
new->next = c->head;
c->head = new;
Error checking, asserts
return TRUE;
omitted for clarity!
}
Add to head
Last-In-First-Out (LIFO) semantics
Modifications
head is tail->next
LIFO or FIFO using ONE pointer
First-In-First-Out (FIFO)
head
Applications requiring
both way search
prev
prev
Binary Tree
The simplest form of Tree is a Binary Tree
Binary Tree Consists of
Note the
recursive
definition!
Each sub-tree
is itself
a binary tree
A
C
B
D
Strictly Binary Tree: If every nonleaf node in a binary tree has non empty left
and right subtrees
Level of a node: Root has level 0. Level of any node is one more than the level
of its father
Depth: Maximum level of any leaf in the tree
A binary tree can contain at most 2l nodes at level l
Total nodes for a binary tree with depth d = 2d+1 - 1
struct t_node {
void *item;
struct t_node *left;
struct t_node *right;
};
typedef struct t_node *Node;
struct t_collection {
Node root;
};
Height, h
Nodes traversed in a path from the root to a leaf
Number of nodes, h
n = 1 + 21 + 22 + + 2h = 2h+1 - 1
h = floor( log2 n )
Since we need at most h+1 comparisons,
find in O(h+1) or O(log n)
Two binary trees are MIRROR SIMILAR if they are both empty or if they are
nonempty, the left subtree of each is mirror similar to the right subtree
General Tree
A Hierarchical Tree
Heaps
Heaps are based on the notion of a complete tree
A binary tree is completely full if it is of height, h, and has 2h+1-1 nodes.
it is empty or
its left subtree is complete of height h-1 and its right subtree is completely full of height h2 or
its left subtree is completely full of height h-1 and its right subtree is complete of height h1.
all nodes at the lowest level are as far to the left as possible.
it is empty or
the key in the root is larger than that in either child and both subtrees have the heap
property.
Heaps (Cont.)
the highest priority item is at the root and is trivially extracted. But if the
root is deleted, we are left with two sub-trees and we must efficiently recreate a single tree with the heap property.
The value of the heap structure is that we can both extract the highest
priority item and insert a new one in O(logn) time.
Example:
Heaps (Cont.)
To work out how we're going to maintain
the heap property, use the fact that a
complete tree is filled from the left. So that
the position which must become empty is
the one occupied by the M. Put it in the
vacant root position.
Heaps (Cont.)
Addition to a Heap
To add an item to a heap, we follow the reverse procedure.
Place it in the next leaf position and move it up.
Again, we require O(h) or O(logn) exchanges.
Comparisons
Add
Delete
Find
Arrays
Linked List
Trees
Simple, fast
Inflexible
O(1)
O(n) inc sort
O(n)
Simple
Flexible
O(1)
sort -> no adv
O(1) - any
O(n) - specific
O(n)
(no bin search)
Still Simple
Flexible
O(log n)
O(n)
O(logn)
binary search
O(log n)
O(log n)
Queues
Queues are dynamic collections which have some concept of order
FIFO queue
LIFO queue
A queue in which the first item added is always the first one out.
Priority queue
A queue in which the items are sorted so that the highest priority
item is always the next one to be extracted.
Stacks
Stacks are a special form of collection
with LIFO semantics
Two methods
int push( Stack s, void *item );
- remove most recently pushed item from the top of the stack
Other methods
int IsEmpty( Stack s );
Determines whether the stack has anything in it
Stack (Cont.)
Stack very useful for Recursions
Key to call / return in functions &
procedures
function f( int x, int y) {
int a;
if ( term_cond ) return ;
a = .;
return g( a );
}
function g( int z ) {
int p, q;
p = . ; q = . ;
return f(p,q);
Context
}
for execution of f
Searching
Computer systems are often used to store large amounts of data
from which individual records must be retrieved according to
some search criterion. Thus the efficient storage of data to
facilitate fast searching is an important issue
Things to consider
the average time
the worst-case time and
Sequential Searches
Time is proportional to n
Binary Search
Logs
Base 2 is by far
the most common
in this course.
Assume base 2
unless otherwise
noted!
Binary search
50
40
Time
Small
problems were not
interested!
Large 4 log n
n
problems
were
interested
in this gap!
30
20
log2n
10
Binary search
More complex
Higher constant factor
0
0
10
20
30
n
40
50
60
Sorting
A file is said to be SORTED on the key if i < j implies that k[i]
preceeds k[j] in some ordering of the keys
Different types of Sorting
Exchange Sorts
Bubble Sort
Quick Sort
Insertion Sorts
Selection Sorts
Heap Sort
Insertion Sort
First card is already sorted
With all the rest,
Scan back from the end until you find the first
card larger than the new one O(n)
Move all the lower ones up one slot O(n)
insert it
O(1)
10
O(n2)
Bubble Sort
Bubble Sort
From the first element
Exchange pairs if theyre out of order
Repeat from the first to n-1
Stop when you have only one element to check
/* Bubble sort for integers */
#define SWAP(a,b) { int t; t=a; a=b; b=t; }
void bubble( int a[], int n ) {
int i, j;
for(i=0;i<n;i++) { /* n passes thru the array */
/* From start to the end of unsorted part */
for(j=1;j<(n-i);j++) {
/* If adjacent items out of order, swap */
if( a[j-1]>a[j] ) SWAP(a[j-1],a[j]);
}
}
}
Overall O(n2)
Two phases
Partition phase
Divides the work into half
< pivot
pivot
> pivot
Sort phase
Conquers the halves!
> pivot
< pivot
< p
> p
pivot
< p
> p
Heap Sort
Heaps also provide a means of sorting:
construct a heap,
add each item to it (maintaining the heap property!),
when all items have been added, remove them one by one
(restoring the heap property as each one is removed).
Addition and deletion are both O(logn) operations. We need to
perform n additions and deletions, leading to an O(nlogn)
algorithm
Generally slower
Comparisons of Sorting
The Sorting Repertoire
Insertion
O(n2)
Guaranteed
Bubble
O(n2)
Guaranteed
Heap
O(n log n)
Guaranteed
Quick
O(n log n)
O(n2)
Bin
O(n)
O(n+m)
Radix
O(n)
O(nlog n)
Bounded keys/duplicates
Hashing
A Hash Table is a data structure that associates each element
(e) to be stored in our table with a particular value known as a
key (k)
We store items (k,e) in our tables
Simplest form of a Hash Table is an Array
A bucket array for a hash table is an array A of size N, where
each cell of A is thought of as a bucket and the integer N
defines the capacity of the array,
Bucket Arrays
If the keys (k) associated with each element (e) are well distributed in
the range [0, N-1] this bucket array is all that is needed.
An element (e) with key (k) is simply inserted into bucket A[k].
If keys are not unique, that is there exists element key pairs (e1, k) and
(e2, k) we will have two different elements mapped to the same bucket.
Hash Functions
Associated with each Hash Table is a function h, known as a
Hash Function.
This Hash Function maps each key in our set to an integer in the
range [0, N-1]. Where N is the capacity of the bucket array.
The idea is to use the hash function value, h(k) as an index into
our bucket array.
So we store the item (k, e) in our bucket at A[h(k)]. That is
A[h(k)] = (Item)(k, e);
chaining,
overflow areas,
re-hashing,
using neighbouring slots (linear probing),
quadratic probing,
random probing,
Chaining
Chaining
One simple scheme is to chain all collisions in lists attached to
the appropriate slot. This allows an unlimited number of
collisions to be handled and doesn't require a priori knowledge
of how many elements are contained in the collection. The
tradeoff is the same as with linked lists versus array
implementations of collections: linked list overhead in space
and, to a lesser extent, in time.
Rehashing
Overflow
Divide the pre-allocated table into two sections: the primary
area to which keys are mapped and an area for collisions,
normally termed the overflow area.
When a collision occurs, a slot in the overflow area is used for
the new element and a link from the primary slot established as
in a chained system. This is essentially the same as chaining,
except that the overflow area is pre-allocated and thus possibly
faster to access. As with re-hashing, the maximum number of
elements must be known in advance, but in this case, two
parameters must be estimated: the optimum size of the primary
and overflow areas.
design systems with
multiple overflow tables
Comparisons
Graph
A graph consists of a set of nodes (or vertices) and a set of arcs (or edges)
Graph G = Nodes {A,B, C} Arcs {(A,C), (B,C)}
Terminology :
A Graph, G is a pair
G = (V, E)
Labeled Graphs:
We may give edges and vertices
labels. Graphing applications often
require the labeling of vertices Edges
might also be numerically labeled.
For instance if the vertices represent
cities, the edges might be labeled to
represent distances.
Graph Terminology
Directed (or digraph) & Undirected Graphs
A directed graph is one in which every edge (u, v) has a direction,
so that (u, v) is different from (v, u). In an undirected graph, there
is no distinction between (u, v) and (v, u). There are two possible
situations that can arise in a directed graph between vertices u
and v.
i) only one of (u, v) and (v, u) is present.
ii) both (u, v) and (v, u) are present.
An edge (u, v) is said to be
directed from u to v if the pair
(u, v) is ordered with u
preceding v.
E.g. A Flight Route
An edge (u, v) is said to be
undirected if the pair (u, v) is
not ordered
Graph Terminology
Two vertices joined by an edge are called the end vertices or
endpoints of the edge.
If an edge is directed its first endpoint is called the origin and
the other is called the destination.
Two vertices are said to be adjacent if they are endpoints of
the same edge.
The degree of a vertex v, denoted deg(v), is the number of
incident edges of v.
The in-degree of a vertex v, denoted indeg(v) is the number of
incoming edges of v.
The out-degree of a vertex v, denoted outdeg(v) is the number
of outgoing edges of v.
Graph Terminology
Two vertices joined by an edge are called the end vertices or
endpoints of the edge.
If an edge is directed its first endpoint is called the origin and the
other is called the destination.
Graph Terminology
h
D
C
f
Graph Terminology
B
D
e
C
Vertices A and B
are endpoints of edge a
f
E
Graph Terminology
D
e
C
Vertex A is the
origin of edge a
f
E
Graph Terminology
D
e
C
Vertex B is the
destination of edge a
f
E
Graph Terminology
D
e
C
Vertices A and B are
adjacent as they are
endpoints of edge a
f
E
Graph Terminology
An edge is said to be incident on a vertex if the vertex is one of the
edges endpoints.
The outgoing edges of a vertex are the directed edges whose origin
is that vertex.
Graph Terminology
h
X
W
f
Graph Terminology
h
X
W
f
Graph Terminology
h
X
W
f
Graph Terminology
The degree of a vertex v, denoted deg(v), is the number of incident
edges of v.
Graph Terminology
h
X
W
f
Graph Terminology
h
X
W
f
Graph Terminology
h
X
W
f
Graph Terminology
h
X
W
f
Graph Terminology
h
X
W
f
Graph Terminology
h
X
W
f
Graph Terminology
Path:
Simple Path:
A path where all where all its edges and vertices are
distinct
Graph Terminology
P1
X
W
f
h
i
Graph Terminology
h
X
c
W
f
P2 is not a simple path
Y
as not all its edges and
vertices are distinct.
P2 = {U, c, W, e, X, g, Y, f, W, d, V}
Graph Terminology
Cycle:
Graph Terminology
h
X
W
f
Simple cycle
{U, a, V, b, X, g, Y, f, W, c}
Graph Terminology
h
X
W
f
Y
Non-Simple Cycle
{U, c, W, e, X, g, Y, f, W, d, V, a}
Graph Properties
Graph Representation
Adjacency Matrix Implementation
A |V| |V| matrix of 0's and 1's. A 1 represents a connection or an edge.
Storage = |V|
(this is huge!!)
For a non-directed graph there will always be symmetry along the top left to bottom
right diagonal. This diagonal will always be filled with zero's. This simplifies Coding.
Coding is concerned with storing the graph
in an efficient manner.
One way is to take all the bits from the
adjacency matrix and concatenate
them to form a binary string.
For undirected graphs, it suffices to concatenate the bits of the upper right triangle of
the adjacency matrix. Graph number zero is a graph with no edges.
Graph Applications
Some of the applications of graphs are :
Networks (computer, cities ....)
Maps (any geographic databases )
Graphics : Geometrical Objects
Neighborhood graphs
Flow Problem
Workflow
Reachability
Graphs
Depth First Search
Breadth First Search
Shortest Paths
Dijkstra's Algorithm
Algorithm DFS(G, v)
Input graph G and a start vertex v of G
Output labeling of the edges of G as
discovery edges and back edges
setLabel(v, Visited)
for all e in G.incidentEdges(v)
if getLabel(e) = Unexplored
w <--- opposite(v, e)
if getLabel(w) = Unexplored
setLabel(e, Discovery)
DFS(G, w)
else
setLabel(e, BackEdge)
Unexplored Vertex
Visited Vertex
Unexplored Edge
Discovery Edge
Back Edge
Start At Vertex A
A
D
C
Discovery Edge
D
C
Visited Vertex B
C
Discovery Edge
C
D
C
Visited Vertex C
D
C
Back Edge
D
C
Discovery Edge
D
C
Visited Vertex D
D
C
Back Edge
D
C
Discovery Edge
D
C
E
Visited Vertex E
Discovery Edge
D
C
Algorithm BFS(G, v)
L0 <-- new empty list
L0.insertLast(v)
setLabel(v, Visited)
i <-- 0
while(Li.isEmpty())
Li+1 <-- new empty list
for all v in G.vertices(v)
for all e in G.incidentEdges(v)
if getLabel(e) = Unexplored
w <-- opposite(v)
if getLabel(w) = Unexplored
setLabel(e, Discovery)
setLabel(w, Visited)
Li+1.insertLast(w)
else
setLabel(e, Cross)
i <-- i + 1
Start Vertex A
Create a sequence L0
insert(A) into L0
Start Vertex A
L0
Start Vertex A
L0
for each v in L0 do
get incident edges of v
Start Vertex A
L0
Start Vertex A
L0
L1
Start Vertex A
L0
L1
Start Vertex A
L0
L1
if edge is unexplored
we get vertex opposite v
say w,
if w is unexplored
Start Vertex A
L0
L1
if w is unexplored
set edge as discovery
Start Vertex A
L0
L1
Start Vertex A
L0
L1
Start Vertex A
L0
L1
Start Vertex A
L0
L1
as L0 is now empty
we continue with list L1
Start Vertex A
L0
L1
as L0 is now empty
we continue with list L1
L2
Start Vertex A
L0
L1
L2
Start Vertex A
L0
L1
L2
Start Vertex A
L0
L1
L2
Start Vertex A
L0
L1
L2
Start Vertex A
L0
L1
L2
Start Vertex A
L0
L1
L2
Weighted Graphs
Shortest Paths
The length of a path is the sum of the weights of the paths edges.
Dijkstra's Algorithm
The distance of a vertex v from a vertex s is the length of a shortest
path between s and v
Assumptions:
Dijkstra's Algorithm
We grow a cloud of vertices, beginning with s and eventually covering all the vertices
We store with each vertex v a label d(v) representing the distance of v from s in the
subgraph consisting of the cloud and its adjacent vertices
At each step
We add to the cloud the vertex u outside the cloud with the smallest distance label,
d(u)
Edge Relaxation
Consider an edge e (u,z) such that
A
2
3
E
5
F
A(0)
2
B
3
E
5
F
A(0)
2
B
3
E
5
F
A(0)
2
B(8)
D(4)
C(2)
3
E
5
F
A(0)
2
B(8)
D(4)
C(2)
3
E
5
F
A(0)
2
B(8)
D(3)
C(2)
3
E(5)
5
F(11)
A(0)
2
B(8)
D(3)
C(2)
3
E(5)
5
F(11)
A(0)
2
B(8)
D(3)
C(2)
3
E(5)
5
F(8)
A(0)
2
B(8)
D(3)
C(2)
3
E(5)
5
F(8)
A(0)
2
B(7)
D(3)
C(2)
3
E(5)
5
F(8)
A(0)
2
B(7)
D(3)
C(2)
3
E(5)
5
F(8)
A(0)
2
B(7)
D(3)
C(2)
3
E(5)
5
F(8)
7
C
A(0)
Insert
B(2)
G(6)
A(0)
Update
B(2)
G(6)
A(0)
Insert
B(2)
C(9)
E(4)
G(6)
A(0)
Update
B(2)
C(9)
E(4)
G(6)
A(0)
Insert
B(2)
C(9)
E(4)
G(5)
F(6)
A(0)
Update
B(2)
C(9)
E(4)
G(5)
F(6)
A(0)
Insert
B(2)
C(9)
E(4)
G(5)
F(6)
A(0)
Update
H(9)
B(2)
7
C(9)
2
2
G(5)
Insert
F(6)
E(4)
A(0)
H(9)
B(2)
C(9)
E(4)
G(5)
F(6)
A(0)
Update
H(8)
B(2)
C(9)
E(4)
G(5)
F(6)
A(0)
Insert
H(8)
B(2)
C(9)
E(4)
G(5)
D(10
)
F(6)
A(0)
Update
H(8)
B(2)
C(9)
E(4)
G(5)
D(10
)
F(6)
A(0)
Insert
H(8)
B(2)
C(9)
E(4)
G(5)
D(10
)
F(6)
A(0)
Update
H(8)
B(2)
C(9)
E(4)
G(5)
D(10
)
F(6)
A(0)
Insert
H(8)
Thank You