A6-R3 Data Structure - C

C
Advance Data Structures
By : Anand B
E-mail :
AnandBDOEACC@gmail.com
Index
• Searching/Sorting
• Link Lists
– Singly
– Doubly
– Circular
• Queue
• Stacks
• Trees
• Graphs
• Symbol Tables
• Garbage Collection
Anand B AnandBDOEACC@gmail.com
H I 1 2 3 4 5 6 7 C *
H I 1 2 3 4 5 6 7 C *
Array Limitations
• Arrays
– Simple,
– Fast
but
– Must specify size at construction time
– Murphy’s law
• Construct an array with space for n
– n = twice your estimate of largest collection
• Tomorrow you’ll need n+1
– More flexible system?
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Flexible space use
– Dynamically allocate space for each element as needed
– Include a pointer to the next item
➧ Linked list
– Each node of the list contains
• the data item (an object pointer in our ADT)
• a pointer to the next node
Data Next
object
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Collection structure has a pointer to the list head
– Initially NULL
Collection
Head
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Collection structure has a pointer to the list head
– Initially NULL
• Add first item
– Allocate space for node
– Set its data pointer to object
– Set Next to NULL
– Set Head to point to new node
Collection
node
Head
Data Next
object
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Add second item
– Allocate space for node
– Set its data pointer to object
– Set Next to current Head
– Set Head to point to new node
Collection
Head
node
node
Data Next
Data Next
object2
Anand B
object
H I 1 2 3 4 5 6 7 C *
Linked Lists - Add implementation
struct t_node {
void *item;
struct t_node *next;
} node;
typedef struct t_node *Node;
struct collection {
Node head;
……
};
int AddToCollection( Collection c, void *item ) {
Node new = malloc( sizeof( struct t_node ) );
new->item = item;
new->next = c->head;
c->head = new;
return TRUE;
}
H I 1 2 3 4 5 6 7 C *
Linked Lists - Add implementation
struct t_node {
void *item;
Recursive type definition -
C allows it!
} node;
struct collection {
Node head;
……
};
int AddToCollection( Collection c, void *item ) {
Node new = malloc( sizeof( struct t_node ) );
new->item = item;
new->next = c->head;
c->head = new; Error checking, asserts
return TRUE; omitted for clarity!
}
H I 1 2 3 4 5 6 7 C *
Linked Lists
• Add time
– Constant - independent of n
• Search time
– Worst case - n
Collection
Head
node
node
Data Next
Data Next
object2
object
H I 1 2 3 4 5 6 7 C *
Linked Lists - Find implementation
• Implementation
void *FindinCollection( Collection c, void *key ) {

Node n = c->head;
while ( n != NULL ) {
if ( KeyCmp( ItemKey( n->item ), key ) == 0 ) {
return n->item;
n = n->next;
}
return NULL;
}
• A recursive implementation is also possible!
H I 1 2 3 4 5 6 7 C *
Linked Lists - Delete implementation
• Implementation
void *DeleteFromCollection( Collection c, void *key ) {

Node n, prev;
n = prev = c->head;
prev->next = n->next;
return n;
}
prev = n;
n = n->next;
} head
return NULL;
}
H I 1 2 3 4 5 6 7 C *
Linked Lists - Delete implementation
• Implementation
void *DeleteFromCollection( Collection c, void *key ) {

Node n, prev;
n = prev = c->head;
prev->next = n->next;
return n;
} head
prev = n;
n = n->next;
} Minor addition needed to allow
return NULL; for deleting this one! An exercise!
}
H I 1 2 3 4 5 6 7 C *
Linked Lists - LIFO and FIFO
• Simplest implementation
– Add to head
➧ Last-In-First-Out (LIFO) semantics
• Modifications
– First-In-First-Out (FIFO)
– Keep a tail pointer
head
struct t_node { tail

void *item;
tail is set in
} node;
the AddToCollection
method if
struct collection {
head == NULL
Node head, tail;
Anand B }; AnandBDOEACC@gmail.com
H I 1 2 3 4 5 6 7
C *
Linked Lists - Doubly linked
• Doubly linked lists
– Can be scanned in both directions
struct t_node {
void *item;
struct t_node *prev,
*next;
} node;

struct collection {
Node head, tail;
}; head prev prev prev
tail
H I 1 2 3 4 5 6 7 C *
Stacks
• A stack is a data structure used to store and retrieve data.
• The stack supports two operations push and pop.
• The push operation places data on the stack and the pop operation
retrieves the data from the stack.
• The order in which data is retrieved from the stack determines the
classification of the stack.
– A FIFO (First In First Out) stack retrieves data placed on the stack first.
– A LIFO (Last In First Out) stack retrieves data placed on the stack last.
H I 1 2 3 4 5 6 7 C *
Stacks
• Stacks are a special form of collection with LIFO semantics
• Two methods
– int push( Stack s, void *item );
- add item to the top of the stack
– void *pop( Stack s );
- remove an item from the top of the stack
• Like a plate stacker
• Other methods
H I 1 2 3 4 5 6 7 C *
Stacks - Implementation
• Arrays
– Provide a stack capacity to the constructor
– Flexibility limited but matches many real uses
• Capacity limited by some constraint
– Memory in your computer
– Size of the plate stacker, etc
• Linked list also possible

• push, pop methods
H I 1 2 3 4 5 6 7 C *
head prev prev prev
tail
H I 1 2 3 4 5 6 7 C *
• Arrays common
– Provide a stack capacity to the constructor
– Flexibility limited but matches many real uses
• Stack created with limited capacity
struct t_node
{ prev is optional!
void *item;
struct t_node *prev,
*next;
} node;
struct collection head prev prev prev
{
Node head, tail;
};
tail
H I 1 2 3 4 5 6 7 C *
Stack Frames - Functions in HLL
• Program
function f( int x, int y) {
int a;
if ( term_cond ) return …;
a = ….;
return g( a );
}
function g( int z ) {
int p, q;
p = …. ; q = …. ;
return f(p,q);
}
Context
for execution of f
H I 1 2 3 4 5 6 7 C *
Stacks
• Application of Stacks
– The stacks can be utilize to evaluate mathematical
expressions
– These can be used to write non recursive programs to
avoid recursion
• Expression Evaluation
– Based on the presence of mathematical operator in the
expression, Expressions are classified into
• Infix
• Postfix
• Prefix
H I 1 2 3 4 5 6 7 C *
Stacks
• Infix
– Mathematical operator is preceded & succeeded by operands
– Ex: A+B
• Postfix
– Operands are succeeded by Mathematical operator
– Ex: AB+
• Prefix
– Operands are preceded by Mathematical operator
– Ex: +AB
• Note:
– Postfix & Prefix expressions are also called as polish expressions.
– Postfix & Prefix expressions are parentheses less expressions.
H I 1 2 3 4 5 6 7 C *
Stacks
• Converting Infix to Postfix (Single digit constants)
• The infix expression must be entered as string
• Extract character one by one until end of the string & perform the fallowing
• Check the open parenthesis.

– If yes push that in to opstack
• Else
– Check for operand
• If yes place that operand directly in the post fix array.
– Else
• Check for operator.
– If yes pop all the operators from the opstack which are having higher or equal precedence of the operator which
is from infix exp.
– And place popped operator in the postfix array.
– After popping operations is over push the infix operator into operators stack.
• Else
– Check for Closing “)” parenthesis.
– If yes pop all the operators from opstack until open parenthesis & place them in the postfix array
• Pop all operators which are remaining in the opstack & place them in the postfix array.
• If any one of above is not true then display error message & terminate prog
H I 1 2 3 4 5 6 7 C *
Stacks
Infix Expression :A+B*C Infix
Postfix Exp :ABC*+
A + B * C \n
1. Read A - operand : Push to Postfix

2. Read + - operator :
• Pop all operators from opstack & push to Postfix array opstack
• Push + in to opstack
3. Read B - operand : Push to Postfix
4. Read * - operator :
• Pop all operators from opstack & push to Postfix array
• Push * in to opstack
5. Read C - operand : Push to Postfix * 4
6. Pop all the operators from opstack
+ 2
Postfix
A B C * +
1 3 5 6a 6b
H I 1 2 3 4 5 6 7 C *
Stacks
Infix Expression :A+B*C+D Infix
Postfix Exp :ABC*+ D+ A + B * C + D \n
1. Read A - operand : Push to Postfix

2. Read + - operator :
• Push + in to opstack opstack
3. Read B - operand : Push to Postfix
4. Read * - operator :
5. Read C - operand : Push to Postfix
6a
6. Read + - operator : 8a
• Pop all operators from opstack & push to Postfix array * 4
• Push + to opstack +
2 + 6d
• Read D - operand : Push to Postfix Postfix
• Pop all the operators from opstack
A B C * + D +
1 3 5 6b 6c 7 8b
H I 1 2 3 4 5 6 7 C *
Stacks
• Evaluating the postfix expression
– Read char by char from postfix array & perform fallowing
• Check for operand, If yes push the value of the operand into the value stack.
• Check for operator.
– If yes perform 2 pop operations on the value stack
– Perform the mathematical operation with popped value
– Push the resultant value into value stack
– Pop the values which remains in the value stack & present that as a result of
the expression
– Ex:
• A+B*C = ABC *+ => 2+3+4 = 234*+
• A+B*C+D = ABC *+D+ => 2+3*4+5 = 234*+5+
• (A+B)*(C+D)=AB+CD+* => (2+3)*(4+5) = 23+45+*
H I 1 2 3 4 5 6 7 C *
Queue
• The queue is another data structure.
• A physical analogy for a queue is a line at a bank. When you go to the bank,
customers go to the rear (end) of the line and customers come off of the line
(i.e., are serviced) from the front of the line.
• Like a stack, a queue usually holds things of the same type.
• The main property of a queue is that objects go on the rear and come off of the
front of the queue.
Front rear Front rear
A B C Add D to Queue D A B C D
Front rear Front rear
A B C D Delete Item B C D
H I 1 2 3 4 5 6 7 C *
Implementing queue
Implementing queue using Array
qsize = 10
Q[qsize]
0 1 2 3 4 5 6 7 8 9
Front=rear= -1
Push: 10
if (rear >= qsize -1) -> Overflow
Front=rear= 0
item = 10 , 20 , 90
Q [++rear] = item 10 20
if (rear=0) front=0
else if (rear= qsize) rear = 0 Front=0 rear= 1
10 20 90
Pop:
if (front == -1|| front > rear) -> Empty Front=0 rear= 2
item = Q [front++]
if (front = qsize ) front= 0
20 30
Front=1 rear=2
H I 1 2 3 4 5 6 7 C *
Queue
• In a normal queue insertion operation can be performed at one end (rear end)
• And deletion operation can be performed at another end (front end)
• In a queue push & pop operations can be performed in different ways also,
based on these methods the queue’s are further classified into
– Dequeue
– Priority Queue
• Dequeue (Double ended Queue)
– It allows insertion & deletion at both ends
• Input Restricted
• Output Restricted
– In the I/P restricted dequeue insertion is done at rear end & deletion can be done at
both ends.
– In the O/P restricted dequeue deletion is done at front end & insertion can be done
at both ends.
H I 1 2 3 4 5 6 7 C *
Implementing Dequeue
• Implementing I/P • Implementing O/P restricted Dequeue
– Display options for push & pop
restricted dequeue – For Push Operation
– Display options for push & • Display option to push (Rear / Front)
pop • Rear: Push the item by increasing rear

• Front: Push the item by decreasing front
– For Push operation – For pop operations
• Increase rear & place the • Front value must be greater than “0”
otherwise overflow
item
• Delete the item by increasing the front
– For pop operation: value
• Display options to pop

(Rear/Front)
• Rear: Pop the item by
decreasing the rear value
• Front: Pop the item by
increasing front value
H I 1 2 3 4 5 6 7 C *
Implementing I/P restricted dequeue using Array
qsize = 10
Q[qsize]
0 1 2 3 4 5 6 7 8 9
Front=rear= -1
Push:
10
item = 10 , 20
Q [++rear] = item Front=rear= 0
if (rear=0) front=0
Pop: 10 20
Rear : Front=0 rear= 1
if (front == -1|| front > rear) -> Empty
item = Q [rear--]
front :
if (front == -1|| front > rear) -> Empty 10
item = Q [front++] Front=rear= 0
H I 1 2 3 4 5 6 7 C *
Implementing O/P restricted dequeue using Array
qsize = 10
Q[qsize]
0 1 2 3 4 5 6 7 8 9
Front=rear= -1
Push:
Rear:
10
item = 10 , 20 Front=rear= 0
Q [++rear] = item
if (rear=0) front=0 10 20
Front: Front=0 rear= 1
if (front = -1) -> Q[++ front] = item
if (front > 0) Q[--front]=item
Pop:
if (front == -1|| front > rear) -> Empty
item = Q [front++] 10
Front=rear= 0
H I 1 2 3 4 5 6 7 C *
Dequeue Implementation
• Implementing Dequeue using link list
– Link list must be circular link list
• Josephs problem
– Let us consider a problem that can be solved using circular list.
– A group of solders surrounded by enemy force. There is no
hope to survive without reinforcement, but there is single horse
available for escape. The solders agree a pact to determine
which of them to escape. They form a circle and a no “n” has
picked. Beginning with the solder whose name is picked they
begin to count clockwise around the circle , when the count
reaches “n” that solder is removed & the count begin again.
Any solder removed from the circle is no longer counted. The
last solder remaining is to take horse & escape.
H I 1 2 3 4 5 6 7 C *
• Using Doubly Link List
• If a structure contains two self referential member then it can be
used to construct DLL.
• In a SSL the last node contains Null in its next ref field.
• In a DLL the last node contains Null in its next ref & the first node
contains Null in its previous ref field.
• A SLL is a one way transversal List, in this list starting from any
node you can reach to last node.
• A DLL is two way transversal, in this starting from any node we
can reach to the beginning of end of list.
• If we can reach to the same node by traversing all nodes of the list
then list is having circular reference.
H I 1 2 3 4 5 6 7 C *
1 F R T N
2 Push
2a 10 N N 2c 90 N N
b
F
2b 10 N N 2d 10 N a 90 N
R R c
F R
2e 20 N N
b
2f 10 N 90 a 20 N
c
F R
H I 1 2 3 4 5 6 7 C *
3 Pop from Front
3a 10 N 90 20 N
a
b
T F R
3b 10 N 90 N 20 N
c
T F R
3c 10 N 90 N 20 N
c
T F R
H I 1 2 3 4 5 6 7 C *
4 Pop from Rear (End)
4a 90 N 20 N
a
F R T
4b 90 N 20 N
a
b
F R T
4c 90 N N 20 N
c a
F R T
H I 1 2 3 4 5 6 7 C *
Priority Queue
H I 1 2 3 4 5 6 7 C *
Simple Queues
• Linked lists provide
– LIFO
– FIFO
semantics
– Constant ( O(1) ) addition and deletion
? What if items in the queue have an order
– Usually termed a priority
– We must sort the items so that
the highest ( lowest ) priority item is removed first
H I 1 2 3 4 5 6 7 C *
Priority Queues
• Items have some ordering relation
– It doesn’t matter much what it is
– As long as there’s some way to define order
• Maintaining order
– Items are added and deleted continuously
– Tree structure
• Mostly O(log n) behaviour
– but can become unbalanced
➧ O(n) behaviour
✖ Not acceptable in a life-critical system!!
Disastrous if your safety estimate assumed O(log n)!!
H I 1 2 3 4 5 6 7 C *
H I 1 2 3 4 5 6 7 C *
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• A symbol table is a set of name value pairs which
contain symbol & their values or addresses
• In any language or package it perform
– Processing of data
– Maintenance of identifies tables, message tables & special
tables
• Operations on symbol tables
– Constructing symbols tables
– Searching in Symbol tables
– Insertion/Deletion of symbols in or from symbol tables
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Symbols tables can be represented by
– Tree structure
– Arrays
• Tree structure used to represent symbol tables are
Binary Search Trees (BST) & Fibo Search Trees
with perfectly height balancing.
• Classification of Symbol Tables
– Static Symbol Tables
– Dynamic Symbol tables
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Static Symbol Tables
– These tables does not allows insertion and deletion of
symbols once the table have been constructed
– The scope of the symbol which are in a static table is
thought the program
– Ex: COBOL Language Environment, C & PASCAL
• Dynamic Symbol Tables
– These tables allows insertion and deletion of symbols
on the tables while execution
– Ex: BASIC, C++ & FORPRO
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Hashtable
– Array representation of symbol table is known as
hash table.
– These are used to provide random access to key
elements or records which are on external storage
media.
– Also used for internal storage purpose
– All symbol tables are memory based tables
– In the Hashtable the table contains so & so number of
buckets (Rows) which specifies no of items.
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Hashtable
– The Hash no, of the item can be calculated through
user defined routines.
– This hash number can be used to provide index to the
item.
– Depending on the size of the table, type of the table &
method of calculating the hash no the hash tables are
classified into
• Closed hash table (Open addressing)
• Open hash table (Separate chaining or unlinked chaining)
H I 1 2 3 4 5 6 7 C *
Symbol Tables
– Closed hash table is linear array which contain either
values or addresses.
– While insertion, the hash number can be calculated
from the key value by using some user defined hash
function as hash ref.
– The address of the value can be placed in the table by
using the generated hash number as subscript.
– In general hash number must be unique.
H I 1 2 3 4 5 6 7 C *
Symbol Tables
– Hash Collision: In some cases there might be
possibility of getting the same ref which is know as
hash collision.
– Hash Collision can be occurred when the
corresponding cell referred by hash number is not
empty cell in the hash table.
– When hash collision occurs we have to place the
value or the address of the identifier in the next
available cell.
H I 1 2 3 4 5 6 7 C *
Symbol Tables
– In resolving the hash collision fallowing probing
(methods) are used
• Linear probing
• Quadratic probing
• Double hashing
• Rehashing
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Linear Probing
– The searching for next available cell is one after the other &
the table must be circular table.
– The formal function is f(i)=i+1
– It is advantageous method in finding cell
– But disadvantageous because it requires no of comparisons.
• Quadratic Probing
– The cell to be checked for availability is based on the formula
f(i)=i2
– Main disadvantage is in some cases we may not find empty
cell even though cells are empty at different positions.
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Double hashing
– The hash value is doubled to find out the next available cell.
– The efficiency can be achieved by considering the table size as
prime no.
– The formal function is f(i)=2i
• Rehashing
– A series of host function can be executed to find out the next
available cell.
– The main disadvantage is we may not access a key value
directly because it may not be in the calculated cell.
H I 1 2 3 4 5 6 7 C *
Symbol Tables
• Open hash Table
H I 1 2 3 4 5 6 7 C *
Storage Allocation & Garbage Collection
• Every language environment should provide facility for
reserving memory to handle the program data and
reserving the space depends on the large environment &
scope of the variables.
• Some language environment provide facility to define
the intermediate variables & allocation of memory at
runtime (Dynamic memory allocation).
• There are two types of methods of allocating memory
– Sequential allocation (Fixed block allocation)
– Dynamic allocation (Varying length block allocation)
H I 1 2 3 4 5 6 7 C *
• Sequential memory allocation:
– System automatically allocates memory to variables
sequentially (Continuous allocation).
– It should not allow allocation of memory at runtime
– Ex: COBOL Language
• Dynamic memory allocation:
– Allocation of memory is possible through system
routines or through user defined functions by
specifying the size of memory to be allocated.
– Ex: Allocating memory to pointers at runtime
H I 1 2 3 4 5 6 7 C *
– The dynamic memory allocation technique can be used to
allocate memory to a pointer which indicates the starting
address of the list.
– This pointer is known as external pointer, and the pointer
which point to next node is known as internal pointer.
– Allocation of memory to nodes can be performed by
considering the whole available memory as single block we
need
• A pointer which address the starting address of the free memory
• Variable which represent the total size of memory that can be used for
data.
H I 1 2 3 4 5 6 7 C *
– Lets consider
• pointer p refers the starting address of free memory
• m is the max size of the block
• n is the size of the requested block for allocation
– Allocation can be done by fallowing routine
If ( p + n < m )
{
var = p ;
p=p+n;
}
else
var = NULL;
H I 1 2 3 4 5 6 7 C *
• P: Pointer to free Memory Free Space (1024)
• Request B1 =150
• Request B2 =200 150 Free
• Request B3 =100 150 200 Free
• Request B4 =175
• Request B5 =275 150 200 100 Free
• Total (900 ) 150 200 100 175 Free
• Request B6 =150 will return
NULL 150 200 100 175 275 Free(124)
• Free Block B1 & B3
• Request B6 =150 will still
return NULL because total Free 200 Free 175 275 Free
free memory (374) is greater
& requested (150) but it is
fragmented
• This can be solved using
Memory compaction
H I 1 2 3 4 5 6 7 C *
• Memory Compaction:
– It is the process of de-fragmenting the allocated
H I 1 2 3 4 5 6 7 C *
H I 1 2 3 4 5 6 7 C *
Hash Tables
• All search structures so far
– Relied on a comparison operation
– Performance O(n) or O( log n)
• Assume I have a function
– f ( key ) → integer
ie one that maps a key to an integer
• What performance might I expect now?
H I 1 2 3 4 5 6 7 C *
Hash Tables - Structure
• Simplest case:
– Assume items have integer keys in the range 1 .. m
– Use the value of the key itself
to select a slot in a
direct access table
in which to store the item
– To search for an item with key, k,
just look in slot k
• If there’s an item there,
you’ve found it
• If the tag is 0, it’s missing.
– Constant time, O(1)
H I 1 2 3 4 5 6 7 C *
Hash Tables - Constraints
• Constraints
– Keys must be unique
– Keys must lie in a small range
– For storage efficiency,
keys must be dense in the range
– If they’re sparse (lots of gaps between values),
a lot of space is used to obtain speed
• Space for speed trade-off
H I 1 2 3 4 5 6 7 C *
Hash Tables - Relaxing the constraints
• Keys must be unique
– Construct a linked list of duplicates
“attached” to each slot
– If a search can be satisfied
by any item with key, k,
performance is still O(1)
but
– If the item has some
other distinguishing feature
which must be matched,
we get O(nmax)
where nmax is the largest number
of duplicates - or length of the longest chain
H I 1 2 3 4 5 6 7 C *
Hash Tables - Relaxing the constraints
• Keys are integers
– Need a hash function
h( key ) → integer
ie one that maps a key to
an integer
– Applying this function to the
key produces an address
– If h maps each key to a unique
integer in the range 0 .. m-1
then search is O(1)
H I 1 2 3 4 5 6 7 C *
Hash Tables - Hash functions
• Form of the hash function
– Example - using an n-character key
int hash( char *s, int n ) {
int sum = 0;
while( n-- ) sum = sum + *s++;
return sum % 256;
}
returns a value in 0 .. 255
– xor function is also commonly used
sum = sum ^ *s++;
– But any function that generates integers in 0..m-1 for some suitable (not
too large) m will do
– As long as the hash function itself is O(1) !
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collisions
• Hash function
– With this hash function
int hash( char *s, int n ) {
int sum = 0;
while( n-- ) sum = sum + *s++;
return sum % 256;
}
– hash( “AB”, 2 ) and
hash( “BA”, 2 )
return the same value!
– This is called a collision
– A variety of techniques are used for resolving collisions
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collision handling
• Collisions
– Occur when the hash function maps
two different keys to the same address
– The table must be able to recognise and resolve this
– Recognise
• Store the actual key with the item in the hash table
• Compute the address
– k = h( key )
• Check for a hit
– if ( table[k].key == key ) then hit
else try next entry
– Resolution
• Variety of techniques
We’ll look at various
“try next entry” schemes
H I 1 2 3 4 5 6 7 C *
Hash Tables - Linked lists
• Collisions - Resolution
➊ Linked list attached
to each primary table slot
• h(i) == h(i1)
• h(k) == h(k1) == h(k2)
– Searching for i1
• Calculate h(i1)
• Item in table, i,
doesn’t match
• Follow linked list to i1
– If NULL found,
key isn’t in table
H I 1 2 3 4 5 6 7 C *
Hash Tables - Overflow area
➋ Overflow area
• Linked list constructed
in special area of table
called overflow area
– h(k) == h(j)
– k stored first
– Adding j
• Calculate h(j)
• Find k
• Get first slot in overflow area
• Put j in it
• k’s pointer points to this slot
– Searching - same as linked list
H I 1 2 3 4 5 6 7 C *
Hash Tables - Re-hashing
➌ Use a second hash function
• Many variations
• General term: re-hashing
– h(k) == h(j)
– k stored first
– Adding j
• Calculate h(j)
• Find k
• Repeat until we find an empty slot
– Calculate h’(j) h’(x) -
• Put j in it second hash function
– Searching - Use h(x), then h’(x)
H I 1 2 3 4 5 6 7 C *
Hash Tables - Re-hash functions
➌ The re-hash function
• Many variations
– Linear probing
• h’(x) is +1
• Go to the next slot
until you find one empty
– Can lead to bad clustering

– Re-hash keys fill in gaps
between other keys and exacerbate
the collision problem
H I 1 2 3 4 5 6 7 C *
Hash Tables - Re-hash functions
➌The re-hash function
• Many variations
– Quadratic probing
• h’(x) is c i2 on the ith probe
• Avoids primary clustering
• Secondary clustering occurs
– All keys which collide on h(x) follow the same sequence
– First
» a = h(j) = h(k)
– Then a + c, a + 4c, a + 9c, ....
– Secondary clustering generally less of a problem
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collision Resolution Summary
• Chaining
+ Unlimited number of elements
+ Unlimited number of collisions
- Overhead of multiple linked lists
• Re-hashing
+ Fast re-hashing
+ Fast access through use of main table space
- Maximum number of elements must be known
- Multiple collisions become probable
• Overflow area
+ Fast access
+ Collisions don't use primary table space
- Two parameters which govern performance need to be estimated
H I 1 2 3 4 5 6 7 C *
Hash Tables - Collision Resolution Summary
• Re-hashing
+ Fast re-hashing
+ Fast access through use of main table space
- Maximum number of elements must be known
- Multiple collisions become probable
• Overflow area
+ Fast access
+ Collisions don't use primary table space
- Two parameters which govern performance need to be
estimated
H I 1 2 3 4 5 6 7 C *
Hash Tables - Summary so far ...
• Potential O(1) search time
– If a suitable function h(key) → integer can be found
• Space for speed trade-off
– “Full” hash tables don’t work (more later!)
• Collisions
– Inevitable
• Hash function reduces amount of information in key
– Various resolution strategies
• Linked lists
• Overflow areas
• Re-hash functions
– Linear probing h’ is +1
– Quadratic probing h’ is +ci2
– Any other hash function!
» or even sequence of functions!
H I 1 2 3 4 5 6 7 C *
Hash Tables - Choosing the Hash Function
• “Almost any function will do”
– But some functions are definitely better than others!
• Key criterion
– Minimum number of collisions
• Keeps chains short
• Maintains O(1) average
H I 1 2 3 4 5 6 7 C *
Hash Tables - Choosing the Hash Function
• Uniform hashing
– Ideal hash function
• P(k) = probability that a key, k, occurs
• If there are m slots in our hash table,
• a uniform hashing function, h(k), would ensure:
Σ P(k) = Σ P(k) = .... Σ P(k) = 1
k | h(k) = 0 k | h(k) = 1 k | h(k) = m-1
m
Read as sum over all k such that h(k) = 0
• or, in plain English,

• the number of keys that map to each slot is equal
H I 1 2 3 4 5 6 7 C *
Hash Tables - A Uniform Hash Function
• If the keys are integers
Read as 0 ≤ k < r
randomly distributed in [ 0 , r ),
then mk
h(k) = r
is a uniform hash function

• Most hashing functions can be made to map the keys
to [ 0 , r ) for some r
– eg adding the ASCII codes for characters mod 255
will give values in [ 0, 256 ) or [ 0, 255 ]
– Replace + by xor
➧ same range without the mod operation
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to [ 0, m )
• We’ve mapped the keys to a range of integers
0≤k<r
• Now we must reduce this range to [ 0, m )
where m is a reasonable size for the hash table
• Strategies
①Division - use a mod function
②Multiplication
③Universal hashing
H I 1 2 3 4 5 6 7 C *
① Division
• Use a mod function
h(k) = k mod m
– Choice of m?
• Powers of 2 are generally not good! k mod 28 selects these bits
h(k) = k mod 2n
selects last n bits of k 0110010111000011010
– All combinations are not generally equally likely

– Prime numbers close to 2n seem to be good choices
eg want ~4000 entry table, choose m = 4093
H I 1 2 3 4 5 6 7 C *
② Multiplication method
• Multiply the key by constant, A, 0 < A < 1
• Extract the fractional part of the product
( kA - kA )
• Multiply this by m
h(k) = m * ( kA - kA )
– Now m is not critical and a power of 2 can be chosen
– So this procedure is fast on a typical digital computer
• Set m = 2p
• Multiply k (w bits) by A•2w ➧ 2w bit product
• Extract p most significant bits of lower half
• A = ½(√5 -1) seems to be a good choice (see Knuth)
H I 1 2 3 4 5 6 7 C *
③ Universal Hashing
• A determined “adversary” can always find a set of data that will defeat any
hash function
• Hash all keys to same slot ➧ O(n) search
– Select the hash function randomly (at run time)
from a set of hash functions
➧ Reduced probability of poor performance
– Set of functions, H, which map keys to [ 0, m )
– H, is universal, if for each pair of keys, x and y,
the number of functions, h ⊂ H,
for which h(x) = h(y) is |H |/m
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to ( 0, m ]
• A determined “adversary” can always find a set of data that
will defeat any hash function
• Hash all keys to same slot ➧ O(n) search
– Select the hash function randomly (at run time)
from a set of hash functions
– ---------
– Functions are selected at run time
• Each run can give different results
• Even with the same data
• Good average performance obtainable
H I 1 2 3 4 5 6 7 C *
Hash Tables - Reducing the range to ( 0, m ]
• Can we design a set of universal hash functions?
• Quite easily
• Key, x = x0, x1, x2, ...., xr
x0 x1 x2 .... xr
• Choose a = <a0, a1, a2, ...., ar> n-bit “bytes” of x
a is a sequence of elements
chosen randomly from { 0, m-1 }
• ha(x) = Σ aixi mod m
• There are mr+1 sequences a,
so there are mr+1 functions, ha(x) Proof:
• Theorem See Cormen
• The ha form a set of universal hash functions
H I 1 2 3 4 5 6 7 C *
Collision Frequency
• Birthdays or the von Mises paradox
– There are 365 days in a normal year
➧ Birthdays on the same day unlikely?
– How many people do I need
before “it’s an even bet”
(ie the probability is > 50%)
that two have the same birthday?
– View
• the days of the year as the slots in a hash table
• the “birthday function” as mapping people to slots
– Answering von Mises’ question answers the question about the
probability of collisions in a hash table
H I 1 2 3 4 5 6 7 C *
Distinct Birthdays
• Let Q(n) = probability that n people have distinct
birthdays
• Q(1) = 1
• With two people, the 2nd has only 364 “free” birthdays
364
Q(2) = Q(1) *
365
• The 3rd has only 363, and so on:

364 364 365-n+1
Q(n) = Q(1) * * *…*
365 365 365
H I 1 2 3 4 5 6 7 C *
Coincident Birthdays
• Probability of having two identical birthdays
• P(n) = 1 - Q(n) 1.000
0.900
0.800
• P(23) = 0.507 0.700
0.600
0.500
0.400
0.300
• With 23 entries, 0.200
0.100
table is only 0.000

0 20 40 60 80
23/365 = 6.3%
full!
H I 1 2 3 4 5 6 7 C *
Hash Tables - Load factor
• Collisions are very probable!
• Table load factor n = number of items
α = n
m m = number of slots
must be kept low

• Detailed analyses of the average chain length
(or number of comparisons/search) are available
• Separate chaining
– linked lists attached to each slot
gives best performance
– but uses more space!
H I 1 2 3 4 5 6 7 C *
Hash Tables - General Design
❶ Choose the table size
• Large tables reduce the probability of collisions!
• Table size, m
• n items
• Collision probability α = n / m
❷ Choose a table organisation
• Does the collection keep growing?
• Linked lists (....... but consider a tree!)
• Size relatively static?
• Overflow area or
• Re-hash
....
❷ Choose a hash function
H I 1 2 3 4 5 6 7 C *
Hash Tables - General Design
❷ Choose a hash function
• A simple (and fast) one may well be fine ...
• Read your text for some ideas!
❸ Check the hash function against your data
➀ Fixed data
• Try various h, m
until the maximum collision chain is acceptable
➧ Known performance
② Changing data
• Choose some representative data
• Try various h, m until collision chain is OK
➧ Usually predictable performance
H I 1 2 3 4 5 6 7 C *
Hash Tables - Review
• If you can meet the constraints
+ Hash Tables will generally give good performance
+ O(1) search
• Like radix sort,
they rely on calculating an address from a key
• But, unlike radix sort,
relatively easy to get good performance
• with a little experimentation
∴ not advisable for unknown data
• collection size relatively static
• memory management is actually simpler
• All memory is pre-allocated!
H I 1 2 3 4 5 6 7 C *
Trees
H I 1 2 3 4 5 6 7 C *
Trees
• It represent the list of items in the bottom up tree fashion.
• Every item can be represented as NODE in the tree.
• The NODE which is at tope is called as root node.
• The nodes which are connected to root node are called as Sub root
node (Sub trees) or leaf node.
• A node which does not contain any sub node are called leaf node.
• A node which contains sub nodes are called as non leaf node. And
also referred as sub leaf node.
• In the father child relation the root node can be ref as father
(parents) and the sub nodes which are directly connected to the
father are called as children.
• The children of same fathers are called as siblings.
H I 1 2 3 4 5 6 7 C *
Trees
• Root nodes: A A
• Leaf Nodes: C,E,G,H,I,J,K
• Non Leaf Nodes : A,B,D,F
• Siblings
– B,C,D B C D
– E,F
– G,H,I
– J,K E F G H I
• Children of A: B,C,D
• Children of B: EF
• Children of D: G,H,I J K
• Children of F: J,K
• Ansister to J & K : F,B,A
• Ansister to G,H,I : D,A
• Order of Tree : 3
H I 1 2 3 4 5 6 7 C *
Trees
• Order of tree refers max no of nodes we can connect to any node
of the tree. (Above tree: 3)
• The degree of node specifies the no of active connections (nodes).
• There is no restriction on the order of general tree.
• Based on the implementation we have to define the restrictions
– The degree of node A is 3 & D is 3
– The degree of node B & F are 2
• Depth of the tree: If the tree is referred with level structure then
the level no start with 0 at root & increment by 1 towards
descendence (downwards)
H I 1 2 3 4 5 6 7 C *
Binary Tree
• If the order of the tree is two A
then that tree can be referred
as Binary tree.
• In BT any non leaf node can B C
have only 2 sub nodes.
• The first node which is at
top is root node. D E F G
• First sub node is known as
Left Son (Left sub tree) H I J
• And the second node is
known as Right son (Right
sub node)
H I 1 2 3 4 5 6 7 C *
Complete Trees
• A binary tree is completely full if
– it has height, h, and
– it has 2h+1-1 nodes
• A binary tree of height, h, is complete iff
– it is empty or
– its left subtree is complete
of height h-1 and
its right subtree is completely full
of height h-2
or
– its left subtree is completely full
of height h-1 and
its right subtree is complete
of height h-1
H I 1 2 3 4 5 6 7 C *
Complete Trees
• If we examine the examples, we see that a complete tree is “filled
in” from the left
➊ ➋ ➌
Order for nodes to be added
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Method of transversal A
– Level Order transversal (LOT)
– Pre order transversal (POT)
– In order transversal (IOT) B C
– Post order transversal (PtOT)
D E F G
H I J
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Level Order transversal A
– In this method NODES
can be transverse level by
level starting from root B C
node
– Before transverse the
D E F G
nodes which are at level n
the control must transverse
all the node which are H I J
level n-1
– A, B,C, D,E,F,G, H,I,J
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Pre Order transversal +
– Nodes can be transverse from root-left-right
– Ex: +AB
– A,B,D,E,H,I,C,F,J,G A B
• In order transversal
– Nodes can be transverse from left-root-right +
– Ex: A+B
– D,B,H,E,I,A,J,F,C,G
– In BST the data must be transverse in the ascending A B
order
• Post order transversal
– Nodes can be transverse from left-right-root +
– AB+
– D,H,I,E,B,J,F,G,C,A A B
H I 1 2 3 4 5 6 7 C *
Tree Traversal
• Traversal = visiting every node of a tree
• Three basic alternatives
➊ Pre-order ①
– Root ③
– Left sub-tree ②
– Right sub-tree
④
⑤
x A +x+BC xDE F
L R
L R L
H I 1 2 3 4 5 6 7 C *
Tree Traversal
➋ In-order
②
– Left sub-tree
– Root ① ⑩
– Right sub-tree
⑥ 11
④ ⑧
Ax B+C xDxE +F ③ ⑦
⑤ ⑨
L
L R
H I 1 2 3 4 5 6 7 C *
Tree Traversal
➌ Post-order
11
– Left sub-tree
– Right sub-tree ① ⑩
– Root
⑧
⑨
A B C+ D Exx F+x ④ ⑦
L ② ③ ⑤ ⑥
L R
H I 1 2 3 4 5 6 7 C *
Tree Traversal
➌ Post-order 11
– Left sub-tree ⑩
①
– Right sub-tree
– Root ⑧
⑨
➥ Reverse-Polish ④ ⑦
(A (((BC+)(DEx) x) F +)x )
② ③ ⑤ ⑥
• Normal algebraic form
(A x(((B+C)(DxE))+F))
= which traversal?
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Constructing BT
– To construct a binary tree we require a self ref structure with
two pointers
• One is to refer to left sub tree
• Other is to refer to right sub tree
– The node which contain NUL in both ref can be refereed as
leaf node
– To insert a node at level n, first we have to fulfill (n-1) level
with nodes.
– The method of constructing a BT is level order construction &
it requires O/P restricted Dequeue.
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Constructing BT –Steps
– Make a first node as root node & push the address of first node in to O/P
restricted Dequeue
– For second node on words for each new node
• Pop the address from O/P dequeue
• If left is empty connect the new node as left son & push the popped address at
front side. And push the newly constructed node address at rear
• If the left is not empty, connect the new node as right son & push only the new
node address into the dequeue at rear.
– Representation in Data Structure

typedef struct tree
{ int no;
struct TREE *left;
struct TREE *right;
}TREE;
H I 1 2 3 4 5 6 7 C *
Binary Tree
1 H 2 *DEQ[10]
T
5 H
T1 N 10 N Rear=0
Front=0
3 T N 10 N
T
4 N 10 N
H If h is null
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 20 N *DEQ[10]
6 T 7 T1
H
Front=1
10 N Rear=0
8 N 20 N
Front=0
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 20 N *DEQ[10]
6 T 7 T1
H
Front=1
10 N Front=0
N 20 N Rear=1
9
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 30 N *DEQ[10]
10 T 11 T1
H
Front=1
10 12
Front=0
N 20 N N 30 N
Rear=1
H I 1 2 3 4 5 6 7 C *
Binary Tree
N 30 N *DEQ[10]
10 T 11 T1
H
Front=1
10
Front=0
N 20 N
Rear=1 N 30 N
13
H I 1 2 3 4 5 6 7 C *
Binary Tree - LOT
Root Info Left Right
Queue[20] K 0 0
A
C 3 6
G 0 0
B C 14
Root A 10 2
H 17 1
D E G H
L 0 0
Avail 9
F J K
4
B 18 13
L 19
F 0 0
E 12 0
15
16
11
J 7 0
D 0 0
20
H I 1 2 3 4 5 6 7 C *
Binary Tree - LOT
*DEQ[10]
11 T1
H
Front=1
10
Front=0
N 20 N
Rear=1 N 30 N
13
H I 1 2 3 4 5 6 7 C *
Binary Tree
• A BT is a finite set of elements that is either empty or partitioned into 3 disjoint subsets.
• The first subset contains a single element called the root of the tree.
• Other two sets are themselves binary trees called left & right sub trees of original tree.
• Each element of a binary tree is called as a node of the tree.
• Where as in the multi-way BT a node contains more than one key value (elements) and
no of key vales of node depends upon the order of the tree.
• The order of BT is two.
• If “A” is the root of a BT & “B” is the root of Left or Right subtree then “A” is said to
be the father of “B” & “B” is said to be left or right son of “A”.
• A node that has no son are called as Leaf Node.
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Node N1 is an ancestor of N2 if N1 is either father of N2 or father of some
ancestor of N2.
• Father can be ancestor to its left or right son but ancestor can not be the father.
• A node N2 is left descendent of node N1 if N2 is either the left son of N1 or

descendent of the left son of N1.
• Moving from leaf node to root is called as climbing. Reverse is called as

descending.
• Tree structure can be logically viewed as Bottom up tree.
• Non leaf nodes are called internal nodes & leaf nodes are called external nodes.
H I 1 2 3 4 5 6 7 C *
Binary Tree
• In every non leaf node of a BT, if it A
has non empty left & right subtrees
then it is termed as Strictly binary tree.
•
B C
IF “n” is the no of leaf nodes of a SBT
then the no of non leaf nodes must be
equal to (n-1)
D E
• A SBT with “n” leaf nodes always
contains 2(n-1) no of nodes.
F G
– Total no of Nodes : 2(4) - 1 = 7
– (Sum of leaf nodes + sum of non leaf
nodes)
H I 1 2 3 4 5 6 7 C *
Binary Tree
• The Depth of BT is the max level of any A
leaf in the tree. That is the longest path
from root to any leaf node.
• A SBT who’s leaves at level “d” is
complete BT B C
• A SBT may not be CBT but CBT is
always SBT.
• If a BT contains “n” nodes at level “l” then D E F G
it contains at most “2n” nodes at level
“l+1”
• Max no of nodes at level l=2l
H I J K L M N O
• If “d” is the depth of the tree and tree is
CBT then the total no of nodes of the tree
are 2d+1 -1
– Total no of Nodes in CBT = 2d+1 -1
– Total no of leaf nodes in CBT = 2d
– Total no of non leaf nodes = 2d -1
H I 1 2 3 4 5 6 7 C *
Binary Tree
• If there is a Complete Binary Tree (CBT) & “n” is the total no of
nodes of that tree then the depth of tree :
– d=log2(N+1) -1
– But the general formula is d=log2n
• A BT of a depth “d” is an Almost Complete Binary Tree (ACBT)

– If all legs of the tree are at level “d” or at level d-1
– For any node “nd” in the tree with right decedents must be either at level
“l” or at level “l+1”
– A SBT may be ACBT but ACBT may not be SBT
– A fully BT is generally CBT.
• A S.B.T may not be Fully BT but C.B.T is a Fully BT.
H I 1 2 3 4 5 6 7 C *
Binary Tree
B.T S.B.T A.C.B.T A.C.B.T Not A.C.B.T
H I 1 2 3 4 5 6 7 C *
Binary Tree
• With 2 nodes we can construct 2 diff type of B.T
• With 3 nodes we can construct 5 diff type of B.T
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Height balancing of a S.B.T which contains duplicate values may
be not possible at some instances.
• Level order transversal is also called as Breadth first Search (BFS)
• Pre order transversal is also called as depth first search (DFS)
• In order transversal is also called as symmetric order
• Non recursive functions without using stacks requires either father
field or thread field.
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Constructing a B.S Array which 0 75
represent a B.S.T
– Consider a initial values of the array
are Zeros which represent 1 2
availability of cells. 65 85
– Q=0
4 5 6
– If the node number is q then its 3
55 70 80 95
• Left son : 2q+1
• Right son: 2q+2
– Node no= 10
• 2*10+1 = 21 105
• 2*10+2 = 22
75 65 85 55 70 80 95 . . . . . . . 105
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Deleting a node from B.T
– While deleting in BT, the node which has to be replace that node position
must be the in order successor of the node that is to be deleted.
– First identify node in the tree
– Check the leaf node if yes, free that node by making its parent node ref as
Null else
• IF node is not having any right sub tree then
– move the left son into that position & free the node.
• If node is having only a single right son or having a right sub tree with single
node then
– move the right son into deleted position & free the node.
• If the right son contains left sub tree then
– place the left most node of the right son at the deleted position & free the node.
• Note: If node is deleted from the BT, SBT its inorder should not
change
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Delete node I (30) 100
– It is leaf node
– D->right = Null & free (I) A
• Delete node H (5) 50 200
– It is leaf node
– D->left = Null & free (D)
B C
• Delete node P (350)
25 60 150 300
– It is non leaf node not having right son
– M->left = R & free (P) D E F G
• Delete node R (325)
5 30 125 175 250 400
– It is non leaf node having right sub tree
without having left sub tree H I J K L M
– P->left = S & free (R)
• Delete node G (300) – Need to clear 160 190 350 500
300
– It is non leaf node having right subtree with
left subtree attached to that. G N O P Q
– C->Right = R
325
– P->left = S 250 400
– R->left = g-> left R
L M
– R->right = g-> right 340
– Free (g) P 340 500 Q S
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Finding no of nodes in B.T (Recursive/Non Recursive)
int nc =0
void nodeCount (TREE *head)
{
if (!head)
Return;
nc++;
NodeCount (head->left);
NodeCount (head->right);
}
H I 1 2 3 4 5 6 7 C *
Binary Tree
• Finding depth of the Tree
int depth =0
void TreeDepth (TREE *head , int level)
{
if (!head)
Return;
If ( level > depth )
depth = level;
nc++;
TreeDepth (head->left , level+1 );
TreeDepth (head->right , level+1 );
}
H I 1 2 3 4 5 6 7 C *
AVL Trees
• These trees are also known as
height balanced tree. 70,80,60,50,90,100,40,30,20,110,120
• The concept of AVL tree is to
improve the efficiency of BST
in minimizing the no of 70
comparisons required for
searching.
60 80
• While constructing a BST
based on the values the tree
may not be constructed in 50 90
proper way to satisfy the BS
property which requires log2 N 40 100
+1 comparison in worst case.
• This tree structure violates the
Binary search property while 30 110
performing searching either for
insertion or for deletion.
20 120
• In such case the tree requires
height balancing .
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Particular Nodes can be evaluated through the fallowing formula.
– Height diff = height of left Sub tree – height of right sub tree
• If the height diff is -1 , 0 or 1 then the height balancing is not
required at a particular node & it can be performed by rotating
nodes.
• While performing rotation the in order property should not be
changed
• Based on the height diff the rotations are classified into
– Left Rotation
– Right Rotation
• If the height diff is < -1 then the rotation should be left rotation.
• If the height diff is > +1 then the rotation must be right rotation.
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Based on the node values again rotation is classified into
– Single Right Rotation
– Single Left Rotation
– Double Left Right Rotation
– Double Right Left Rotation
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Single Right Rotation
– The height diff must be >+1
– Value of node must satisfy the following property A>B>C
– A : Node where rotation will require
– B,C: Descendents of A
B
A
70 60
B 60 50 70
C C A
50
H I 1 2 3 4 5 6 7 C *
AVL Trees
x 70
• Before Rotation • After Rotation x
– – A
X -> Left = A X -> Left = B 60 80
– A -> Left = B – A -> Left =Y
b 50
– B -> Left = C – B -> Left = C 65 75 90
– B -> Right = Y – B -> Right = A c

40 55
– A -> Father = X – A -> Father = B Y
– B -> Father = A – B -> Father = X 30
– Y -> Father = B – Y -> Father = A
x 70
B
50 80
C A
40 60 75 90
30 55 65
Y
H I 1 2 3 4 5 6 7 C *
AVL Trees
• Single Left Rotation
– The height diff must be < -1
– Value of node must satisfy the following property A<B<C
– A : Node where rotation will require
– B Should become the sub root
– A Should become the left son & C remains as right son
B
A
70 80
B 80 70 90
A C
90
C
H I 1 2 3 4 5 6 7 C *
AVL Trees
x 70
• Before Rotation • After Rotation
– X -> Right = A – X -> Right = B 60 80 A
– A -> Right = B – A -> Right =Y
– – 50 90 b
B -> Right = C B -> Right = C 65 75
– B -> Left = Y – B -> Left = A 100 c

85
– Y -> Father = B – A -> Father = B Y
– – 110
B -> Father = A B -> Father = X
– A -> Father = x – C -> Father = B x 70
– C -> Father = B
B
50 90
A C
40 60 80 100
75 85 110
Y
H I 1 2 3 4 5 6 7 C *
AVL and other balanced trees
• AVL Trees
– First balanced tree algorithm
– Discoverers: Adelson-Velskii and Landis
• Properties
– Binary tree
– Height of left and right-subtrees differ by at most 1
– Subtrees are AVL trees
AVL Tree AVL Tree
H I 1 2 3 4 5 6 7 C *
AVL trees - Height
• Theorem
– An AVL tree of height h has at least Fh+3+1 nodes
• Proof
– Let Sh be the size of the smallest AVL tree of height h
– Clearly, S0 = 1 and S1 = 2
– Also, Sh = Sh-1 + Sh-2 + 1

– A minimum height tree must be
composed of min height trees
differing in height by at most 1
– By induction ..
• Sh = Fh+3+1
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rebalancing
• Insertion leads to non-AVL tree
– 4 cases
1 2 3 4
– 1 and 4 are mirror images

– 2 and 3 are mirror images
H I 1 2 3 4 5 6 7 C *
• Case 1 solved by rotation
– Case 4 is the mirror image rotation
H I 1 2 3 4 5 6 7 C *
• Case 2 needs a double rotation
– Case 3 is the mirror image rotation
H I 1 2 3 4 5 6 7 C *
AVL Trees - Data Structures
• AVL trees can be implemented with a flag to indicate the balance state
typedef enum { LeftHeavy, Balanced, RightHeavy }

BalanceFactor;
struct AVL_node {
BalanceFactor bf;
void *item;
struct AVL_node *left, *right;
}
• Insertion
• Insert a new node (as any binary tree)
• Work up the tree re-balancing as necessary to restore
the AVL property
H I 1 2 3 4 5 6 7 C *
Dynamic Trees - Red-Black or AVL
• Insertion
– AVL : two passes through the tree
• Down to insert the node
• Up to re-balance
– Red-Black : two passes through the tree
• Down to insert the node
• Up to re-balance
but Red-Black is more popular??
H I 1 2 3 4 5 6 7 C *
Forest
• An ordered set of trees forms a forest
• Ordered tree must satisfy fallowing criteria's
– The Preorder transversal must be same.
– The postorder transversal of the tree must be same as the
inorder transversal of the ordered tree.
• After constructing ordered trees. If we connect them in a
proper way that can be represented as Forest
H I 1 2 3 4 5 6 7 C *
Forest
• Converting Binary Tree to
Ordered Tree A A
– Right son of the parent should
become the left descendant (ie,
Right son should connect to left B C B
son as Right son)
– In this process preorder property C
must not change
– Similarly for a general tree the son
which are in brother relation should
be represented as right descendents
to first son.
H I 1 2 3 4 5 6 7 C *
Searching - Re-visited
• Binary tree O(log n) if it stays balanced
– Simple binary tree good for static collections
– Low (preferably zero) frequency of
insertions/deletions
but my collection keeps changing!
– It’s dynamic
– Need to keep the tree balanced
• First, examine some basic tree operations
– Useful in several ways!
H I 1 2 3 4 5 6 7 C *
Trees - Searching
• Binary search tree
– Produces a sorted list by in-order traversal
• In order: ADE G HKL M NOP T V
H I 1 2 3 4 5 6 7 C *
Trees - Searching
– Preserving the order
– Observe that this transformation preserves the
search tree
H I 1 2 3 4 5 6 7 C *
Trees - Searching
– Preserving the order
– Observe that this transformation preserves the
search tree
• We’ve performed a rotation of the sub-tree

about the T and O nodes
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rotations
– Rotations can be either left- or right-rotations
– For both trees: the inorder traversal is

AxByC
H I 1 2 3 4 5 6 7 C *
AVL Trees - Rotations
– Rotations can be either left- or right-rotations
– Note that in this rotation, it was necessary to move

B from the right child of x to the left child of y
H I 1 2 3 4 5 6 7 C *
Trees - Red-Black Trees
• A Red-Black Tree
– Binary search tree
– Each node is “coloured” red or black
– An ordinary binary search tree with node colourings

to make a red-black tree
H I 1 2 3 4 5 6 7 C *
– Every node is RED or
BLACK
– Every leaf is BLACK
Sentinel nodes (black)
When you examine

rb-tree code, you will
see sentinel nodes (black)
added as the leaves.
They contain no data.
H I 1 2 3 4 5 6 7 C *
– Every node is RED or BLACK
– If a node is RED,
then both children
are BLACK
This implies that no path

may have two adjacent
RED nodes.
(But any number of BLACK
nodes may be adjacent.)
H I 1 2 3 4 5 6 7 C *
then both children
are BLACK
– Every path
from a node to a leaf
contains the same number
of BLACK nodes
From the root,
there are 3 BLACK nodes
on every path
H I 1 2 3 4 5 6 7 C *
then both children
are BLACK
– Every path
from a node to a leaf
contains the same number
of BLACK nodes
The length of this path is the

black height of the tree
H I 1 2 3 4 5 6 7 C *
• Lemma
A RB-Tree with n nodes has
height ≤ 2 log(n+1)
– Proof .. See Cormen
• Essentially,
height ≤ 2 black height
• Search time
O( log n )
H I 1 2 3 4 5 6 7 C *
• Data structure
– As we’ll see, nodes in red-black trees need to know their parents,
– so we need this data structure
struct t_red_black_node {
enum { red, black } colour;
void *item; Same as a
struct t_red_black_node *left, binary tree
with these two
*right,
attributes
*parent; added
}
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
• Insertion of a new node
– Requires a re-balance of the tree
Insert node
4
Mark it red
Label the current node

x
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
While we haven’t reached the root

and x’s parent is red
x->parent
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
If x is to the left of it’s granparent
x->parent->parent
x->parent
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
y is x’s right uncle
x->parent->parent
x->parent
right “uncle”
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
If the uncle is red, change

the coloursx->parent->parent
of y, the grand-parent
and the parent
x->parent
right “uncle”
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
x’s parent is a left again,

mark x’s uncle
but the uncle is black this time
New x
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
.. but the uncle is black this time

and x is to the right of it’s parent
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
.. So move x up and
rotate about x as root ...
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
.. but x’s parent is still red ...
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
.. The uncle is black ..
uncle
.. and x is to the left of its parent
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
.. So we have the final case ..
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
.. Change colours
and rotate ..
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
This is now a red-black tree ..

So we’re finished!
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
There’s an equivalent set of

cases when the parent is to
the right of the grandparent!
H I 1 2 3 4 5 6 7 C *
Red-black trees - Analysis
• Addition
– Insertion Comparisons O(log n)
– Fix-up
• At every stage,
x moves up the tree
at least one level O(log n)
– Overall O(log n)
• Deletion
– Also O(log n)
• More complex
• ... but gives O(log n) behaviour in dynamic cases
H I 1 2 3 4 5 6 7 C *
Red Black Trees - What you need to know?
• Code?
– This is not a course for masochists!
• You can find it in a text-book
• You need to know
– The algorithm exists
– What it’s called
– When to use it
• ie what problem does it solve?
– Its complexity
– Basically how it works
– Where to find an implementation
• How to transform it to your application
H I 1 2 3 4 5 6 7 C *
Dynamic Trees - A cautionary tale
• Insertion
– If you read Cormen et al,
• There’s no reason to prefer a red-black tree
– However, in Weiss’ text
M A Weiss, Algorithms, Data Structures and Problem Solving with
C++, Addison-Wesley, 1996
– you find that you can balance a red-black tree
in one pass!
– Making red-black more efficient than AVL
if coded properly!!!
H I 1 2 3 4 5 6 7 C *
• Insertion
– If you read Cormen et al,
• There’s no reason to prefer a red-black tree
– However, in Weiss’ text
M A Weiss, Algorithms, Data Structures and Problem Solving with
C++, Addison-Wesley, 1996
– you find that you can balance a red-black tree
in one pass!
– Making red-black more efficient than AVL
if coded properly!!!
Moral: You need to read the literature!
H I 1 2 3 4 5 6 7 C *
• Insertion in one pass
– As you proceed down the tree,
if you find a node with two red children,
make it red and the children black
– This doesn’t alter the number of black nodes in any path
– If the parent of this node was red,
a rotation is needed ...
– May need to be a single or a double rotation
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
• Adding 4 ...
Discover two red
children here
Swap colours around
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
• Adding 4 ...
Red sequence,
violates
red-black property
Rotate
H I 1 2 3 4 5 6 7 C *
Trees - Insertion
• Adding 4 ...
Rotate
Add the 4
H I 1 2 3 4 5 6 7 C *
Balanced Trees - Yet more variants
• Basically the same ideas
– 2-3 Trees
– 2-3-4 Trees
• Special cases of m-way trees ... coming!
• Variable number of children per node
➧ A more complex implementation
• 2-3-4 trees
– Map to red-black trees
∴ Possibly useful to understand red-black trees
H I 1 2 3 4 5 6 7 C *
Lecture 12 - Key Points
• AVL Trees
– First dynamically balanced tree
– Height within 44% of optimum
– Rebalanced with rotations
– O(log n)
• Less efficient than properly coded red-black trees
• 2-3, 2-3-4 trees
– m-way trees - Yet more variations
– 2-3-4 trees map to red-black trees
H I 1 2 3 4 5 6 7 C *
m-way trees
• Only two children per node?
• Reduce the depth of the tree to O(logmn)
with m-way trees
• m children, m-1 keys per node

• m = 10 : 106 keys in 6 levels vs 20 for a binary tree
• but ........
H I 1 2 3 4 5 6 7 C *
m-way trees
• But you have to search through
the m keys in each node!
• Reduces your gain from having fewer levels!
➧ A curiosity only?
H I 1 2 3 4 5 6 7 C *
B-trees
• All leaves are on the same level
• All nodes except for the root and the leaves
have Each node is at least
– at least m/2 children half full of keys
– at most m children
• B+ trees
– All the keys in the nodes are dummies
– Only the keys in the leaves point to “real” data
– Linking the leaves
• Ability to scan the collection in order
without passing through the higher nodes
H I 1 2 3 4 5 6 7 C *
B+-trees
• B+ trees
– All the keys in the nodes are dummies
– Only the keys in the leaves point to “real” data
– Data records kept in a separate area
H I 1 2 3 4 5 6 7 C *
B+-trees - Scanning in order
• B+ trees
– Linking the leaves
• Ability to scan the collection in order
without passing through the higher nodes
H I 1 2 3 4 5 6 7 C *
B+-trees - Use
• Use - Large Databases
– Reading a disc block is much slower than reading memory ( ~ms vs ~ns )
– Put each block of keys in one disc block
Physical disc
blocks
H I 1 2 3 4 5 6 7 C *
B-trees - Insertion
• Insertion
– B-tree property : block is at least half-full of keys
– Insertion into block with m keys
• block overflows
• split block
• promote one key
• split parent if necessary
• if root is split, tree becomes one level deeper
H I 1 2 3 4 5 6 7 C *
B-trees - Insertion
• Insertion
– Insert 9
– Leaf node overflows,
split it
– Promote middle (8)
– Root overflows,
split it
– Promote middle (6)
– New root node formed
– Height increased by 1
H I 1 2 3 4 5 6 7 C *
B-trees on disc
• Disc blocks
– 512 - 8k bytes
∴ 100s of keys
➧ Use binary search within the block
• Overall
– O( log n )
– Matched to hardware!
• Deletion similar
– But merge blocks to maintain B-tree property
(at least half full)
H I 1 2 3 4 5 6 7 C *
Graphs
H I 1 2 3 4 5 6 7 C *
Graphs
• A graph consist of set of nodes (Vertices) & set of arcs
(edges) which connects the nodes.
• All nodes may not be connected.
• Arcs can be either ordered pair or normal pairs and these
can be represented by the nodes which can be connected
by arcs.
• In undirected graph arc can be represented with (n1,n2).
• In directed graph arc can be represented with <n1,n2>
which is known as ordered pair.
• Digraph : If the arc is represented with arrow head line
then that graph is known as directed Graph (Digraph)
H I 1 2 3 4 5 6 7 C *
Graphs
• This is undirected graphs Arc/Edge
A B
• Nodes are
– (A,B) or (B,A) E
– (A,C) or (C,A)
C D
– (C,D) or (D,C)
Node/Vertices
– (B,E) or (E,B)
– (D,E) or (E,D) F
– (D,F) or (F,D) H Pendent Vector
• F is Pendent Vector Isolated Vector

• H is Isolated Vector
H I 1 2 3 4 5 6 7 C *
Graphs
• Directed Graph A B
– Nodes which are at arc heads are
known as head nodes E
– Nodes which are at tails are known
as tail nodes C D
– Head node is adjacent to tail node
• Cyclic Graph
Nodes F
– Node A is pointing to itself <A,B>
<A,C>
• Acyclic <C,D>
– No node is pointing to itself <B,D>
<D,C>
• Directed Acyclic graph or Cyclic <D,A>
<E,B>
Directed graph <E,D>
– Directed graph without any cycle <F,D>
H I 1 2 3 4 5 6 7 C *
Graphs
• If the n is incident to arc x then it can be incident to both
the nodes which forms an ordered pair.
• Degree of node : max no of incidents of a node.
• IN-Degree : no of incidents which contains that node as
head node.
• OUT-Degree : no of incidents which contains that node
as tails node.
• Eg: For D-Node
– Degree is 6
– In-Degree is 4
– Out-Degree is 2
H I 1 2 3 4 5 6 7 C *
Graphs
• Applications of Graph
– Operations Research
• PERT charts
• CPM charts
– Flow problem
– Network problems
• If the arc contain some value then the value is known as
weight of the arc & the graph can be referred as
weighted graph.
50
A B
H I 1 2 3 4 5 6 7 C *
Graphs
• The graph can be represented through
– Arrays
– Tree structures
– Sparse Matrix
• Adjacency Matrix
– When the graph is represented with the 2 dim array, which shows the
relation then that 2 dim array is called as Adjacency Matrix.
– The node data can be stored in separate hash table by giving numbers to
nodes starting from Zero.
– The element of matrix can be either weight or Boolean values.
– The matrix with Boolean values is known as Adjacency Matrix.
– The order of matrix is depends on the no of nodes in the graph.
– If n is no of nodes then Order of matrix is n*n.
– It must represent only ordered pairs.
H I 1 2 3 4 5 6 7 C *
Graphs - Data Structures
• Vertices
– Map to consecutive integers
– Store vertices in an array
• Edges
– Adjacency Matrix
• Booleans -
TRUE - edge exists
FALSE - no edge
• O(|V|2) space
• Can be compacted
– 1 bit/entry
– If undirected,
top half only
H I 1 2 3 4 5 6 7 C *
Graphs - Data Structures
• Edges
– Adjacency Lists
• For each vertex
– List of vertices “attached” to it
• For each edge
– 2 entries
– One in the list for each end
• O(|E|) space
∴ Better for sparse graphs
Undirected representation
H I 1 2 3 4 5 6 7 C *
Graphs
• Graph Operations are
– Establishing relations in the adjacency Matrix or in
the weighted matrix.
– Removing the relations
– Finding the path matrix using adjacency Matrix

– Finding the transitive closure matrix for weighted graph
– Finding the shortest distance between two nodes
H I 1 2 3 4 5 6 7 C *
Graphs
• Finding the path matrix using adjacency Matrix
– The adjacency Matrix can be know as PATH of LENGTH 1 Matrix.
– If nodes A & B are in direct relations then the no of path between A & B
are 1, & it is referred as PATH of LENGTH 1.
– Here the no of nodes are 2, no of intermediate nodes are 0 & the no of paths
are 1.
– Hence path of length k of 2 nodes which are in indirect relation through k-1
no of nodes, Total no of nodes = k+ 1
– After considering adjacency matrix as PATH 1 matrix (p1), Boolean
product of p1 and adjacency matrix returns PATH 2 matrix (p2). Once
again the Boolean product of p2 and adjacency matrix returns PATH
matrix.
– The PATH k matrix (Pk) is the Boolean product of Pk-1 & adjacency matrix.
– The Matrix which shows k no of possible paths is known as PATH of
length K matrix which is known as Transitive closure
H I 1 2 3 4 5 6 7 C *
Graphs
• ….
H I 1 2 3 4 5 6 7 C *
Graphs
• Representing Graph through Multi-way linked list
– While representing graph with a LL, the main LL must contain all graph nodes &
the sub list which are connected to LL nodes should represent the Ordered pairs
A B C D E F N
B C N D N D N B D N D N
A C N
H I 1 2 3 4 5 6 7 C *
Graphs
• Nodes which are in the main LL are known as Graph nodes.
• Nodes which are in the sub list are known as ARC nodes.
• Hence the graph nodes structure definition
– Data members
– Two Self ref pointer for DLL
– A pointer for arc node
• The arc node structure Definition
– A Self ref pointer to indicate next arc
– A pointer for graph node
H I 1 2 3 4 5 6 7 C *
Graphs
typedef struct gnode
{
int n1,n2,n3;
struct gnode *prev;
struct gnode *head;
struct arc *arcptr;
}GNODE;
typedef struct gnode

{
struct gnode *gphptr;
struct arc *next;
}ARCNODE;
H I 1 2 3 4 5 6 7 C *
Graphs
• Finding the transitive closure matrix for weighted graph
– While representing weighted graph through matrix the
elements which represent the relations should contain weights
– All elements must be initialize with some value to apply
WARSHALL’S algorithm to find out shortest distance
between two nodes.
– To find out transitive closure matrix construct an adjacency
matrix from the weighted matrix & apply WARSHALL’S
Algorithm on the adjacency matrix.
H I 1 2 3 4 5 6 7 C *
Graphs
• DIJIKSTRA’S Algorithm
• This algorithm can be used to find out shortest root from the
source node to target node.
– Consider the source node & make it as permanent with its label which
contains distance and its predecessor node.
– For the source node distance will be Zero.
– Predecessor will be NULL
– Identify all reachable nodes from that node and construct labels with sum of
the distance from current node to reachable node and with predecessor
– If the already existing label is permanent avoid the current label otherwise
make the least label as permanent
– Continue with second step until al labels become permanent or no node to

reach from the current node. In second case make all the temp labels as
permanent labels.
H I 1 2 3 4 5 6 7 C *
Graphs
• DIJIKSTRA’S Algorithm A B
C D F
H I 1 2 3 4 5 6 7 C *
Graphs - Traversing
• Choices
– Depth-First / Breadth-first
• Depth First
– Use an array of flags to mark
“visited” nodes
H I 1 2 3 4 5 6 7 C *
Graphs - Depth-First
struct t_graph {
int n_nodes; Graph data
graph_node *nodes; structure
int *visited;
AdjMatrix am; Adjacency Matrix ADT
}
static int search_index = 0;
void search( graph g ) {

int k; Mark all nodes “not visited”
for(k=0;k<g->n_nodes;k++) g->visited[k] = 0;
search_index = 0;
for(k=0;k<g->n_nodes;k++) { Visit all the nodes
if ( !g->visited[k] ) visit( g, k ); attached to node 0,
} then ..
}
H I 1 2 3 4 5 6 7 C *
void visit( graph g, int k ) {
int j; Mark the order in which
g->visited[k] = ++search_index; this node was visited
for(j=0;j<g->n_nodes;j++) {
if ( adjacent( g->am, k, j ) ) {
if ( !g->visited[j] ) visit( g, j );
}
Visit all the nodes adjacent
to this one
H I 1 2 3 4 5 6 7 C *
int j; Mark the order in which
g->visited[k] = ++search_index; this node was visited
for(j=0;j<g->n_nodes;j++) {
if ( adjacent( g->am, k, j ) ) {
}
Visit all the nodes adjacent
to this one
C hack ...
Should be g->visited[j] != 0
Search_index == 0 means not visited yet!
H I 1 2 3 4 5 6 7 C *
Adjacency List version of visit
AdjListNode al_node;
g->visited[k] = ++search_index;
al_node = ListHead( g->adj_list[k] );
while( n != NULL ) {
j = ANodeIndex( ListItem( al_node ) );
al_node = ListNext( al_node );
}
}
H I 1 2 3 4 5 6 7 C *
Adjacency List version of visit
AdjListNode al_node;
al_node = ListHead( g->adj_list[k] );
while( n != NULL ) {
Assumes
j = ANodeIndex( ListItem( a List
al_node ADT with methods
) );
ListHead
ANodeIndex
} ListItem
}
ListNext
H I 1 2 3 4 5 6 7 C *
Graph - Breadth-first Traversal
• Adjacency List
– Time complexity
• Visited set for each node
• Each edge visited twice
– Once in each adjacency list
• O(|V| + |E|)
➧ O(|V|2) for dense |E| ~ |V|2 graphs
• but O(|V|) for sparse |E| ~ |V| graphs
• Adjacency Lists perform better for sparse graphs
H I 1 2 3 4 5 6 7 C *
• Breadth-first requires a FIFO queue
static queue q;
void search( graph g ) {
q = ConsQueue( g->n_nodes );
for(k=0;k<g->n_nodes;k++) g->visited[k] = 0;
search_index = 0;
for(k=0;k<g->n_nodes;k++) {
if ( !g->visited[k] ) visit( g, k );
}

al_node al_node;
int j;
AddIntToQueue( q, k );
while( !Empty( q ) ) {
k = QueueHead( q );
......
H I 1 2 3 4 5 6 7 C *
• Breadth-first requires a FIFO queue
al_node al_node;
int j;
AddIntToQueue( q, k ); Put this node on the queue
while( !Empty( q ) ) {
k = QueueHead( q );
al_node = ListHead( g->adj_list[k]);
while( al_node != NULL ) {
j = ANodeIndex(al_node);
if ( !g->visited[j] ) {
AddIntToQueue( g, j );
g->visited[j] = -1; /* C hack, 0 = false! */
}
}
}
}
H I 1 2 3 4 5 6 7 C *
H I 1 2 3 4 5 6 7 C *
Key Points - Lecture 19
• Dynamic Algorithms
• Optimal Binary Search Tree
– Used when
• some items are requested more often than others
• frequency for each item is known
– Minimises cost of all searches
– Build the search tree by
• Considering all trees of size 2, then 3, 4, ....
• Larger tree costs computed from smaller tree costs
– Sub-trees of optimal trees are optimal trees!
• Construct optimal search tree by saving root of each optimal sub-tree
and tracing back
• O(n3) time / O(n2) space
H I 1 2 3 4 5 6 7 C *
Key Points - Lecture 19
• Other Problems using Dynamic Algorithms
• Matrix chain multiplication
– Find optimal parenthesisation of a matrix product
• Expressions within parentheses
– optimal parenthesisations themselves
• Optimal sub-structure characteristic of dynamic algorithms
• Similar to optimal binary search tree
• Longest common subsequence
– Longest string of symbols found in each of two sequences
• Optimal triangulation
– Least cost division of a polygon into triangles
– Maps to matrix chain multiplication
H I 1 2 3 4 5 6 7 C *
Graphs - Definitions
• Graph
– Set of vertices (nodes) and edges connecting them
– Write
G = ( V, E )
where
• V is a set of vertices: V = { vi }
• An edge connects two vertices: e = ( vi , vj )
• E is a set of edges: E ={ (vi Vertices
, vj ) }
Edges
H I 1 2 3 4 5 6 7 C *
• Path
– A path, p, of length, k, is a sequence of connected
vertices
– p = <v0,v1,...,vk> where (vi,vi+1 ∈f,Eg, h >
< )i, c,
Path of length 5
< a, b >
Path of length 2
H I 1 2 3 4 5 6 7 C *
• Cycle
– A graph contains no cycles if there is no path
– p = <v0,v1,...,vk> such that v0 = vk
< i, c, f, g, i >
is a cycle
H I 1 2 3 4 5 6 7 C *
• Spanning Tree
– A spanning tree is a set of |V|-1 edges that connect
all the vertices of a graph
The red path connects

all vertices,
so it’s a spanning tree
H I 1 2 3 4 5 6 7 C *
• Minimum Spanning Tree
– Generally there is more than one spanning tree
– If a cost cij is associated with edge eij = (vi,vj)
then the minimum spanning tree is the set of edges Espan such
that
C = Σ ( cij | ∀ eij ∈ Espan ) Other ST’s can be formed ..
• Replace 2 with 7
is a minimum • Replace 4 with 11
The red tree is the

Min ST
H I 1 2 3 4 5 6 7 C *
Graphs - Kruskal’s Algorithm
• Calculate the minimum spanning tree
– Put all the vertices into single node trees by themselves
– Put all the edges in a priority queue
– Repeat until we’ve constructed a spanning tree
• Extract cheapest edge
• If it forms a cycle, ignore it
else add it to the forest of trees
(it will join two trees into a larger tree)
– Return the spanning tree
H I 1 2 3 4 5 6 7 C *
Graphs - Kruskal’s Algorithm
• Calculate the minimum spanning tree
– Put all the vertices into single node trees by themselves
– Put all the edges in a priority queue
– Repeat until we’ve constructed a spanning tree
• Extract cheapest edge
• If it forms a cycle, ignore it
else add it to the forest of trees
(it will
Note that joinalgorithm
this two trees into a larger
makes no tree)
attempt
– •Return the spanning tree
to be clever
• to make any sophisticated choice of the next edge
• • it just tries the cheapest one!
H I 1 2 3 4 5 6 7 C *
Graphs - Kruskal’s Algorithm in C
Forest MinimumSpanningTree( Graph g, int n,
double **costs ) {
Forest T;
Queue q;
Edge e;
Initial Forest: single vertex trees
T = ConsForest( g );
q = ConsEdgeQueue( g, costs );
P Queue of edges
for(i=0;i<(n-1);i++) {
do {
e = ExtractCheapestEdge( q );
} while ( !Cycle( e, T ) );
AddEdge( T, e );
}
return T;
}
H I 1 2 3 4 5 6 7 C *
double **costs ) {
Forest T;
Queue q; We need n-1 edges
Edge e; to fully connect (span)
T = ConsForest( g ); n vertices
for(i=0;i<(n-1);i++) {
do {
AddEdge( T, e );
}
return T;
}
H I 1 2 3 4 5 6 7 C *
double **costs ) {
Forest T;
Queue q;
Edge e;
for(i=0;i<(n-1);i++) {Try the cheapest edge
do {
AddEdge( T, e ); Until we find one that doesn’t
} form a cycle
return T;
... and add it to the forest
}
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm
• Priority Queue
– WeMinimumSpanningTree(
Forest already know about this!!
Graph g, int n,
double **costs ) {
Forest T;
Queue q;
Edge e;
Add to
a heap here
for(i=0;i<(n-1);i++) {
do {
Extract from
a heap here
AddEdge( T, e );
}
return T;
}
H I 1 2 3 4 5 6 7 C *
• Cycle detection
double **costs ) {
Forest T;
Queue q;
Edge e;
for(i=0;i<(n-1);i++) {
do {
e = ExtractCheapestEdge( q ); But how do
} while ( !Cycle( e, T ) ); we detect a
AddEdge( T, e ); cycle?
}
return T;
}
H I 1 2 3 4 5 6 7 C *
• Cycle detection
– Uses a Union-find structure
– For which we need to understand a partition of a set
• Partition
– A set of sets of elements of a set
• Every element belongs to one of the sub-sets
• No element belongs to more than one sub-set
– Formally:
• Set, S = { si }
Pi are subsets of S
• Partition(S) = { Pi }, where Pi = { si }
• ∀ si∈ S, si ∈ Pj All si belong to one of the Pj
• ∀ j, k Pj ∩ Pk = ∅
None of the Pi
• S = ∪ Pj
have common elements
Anand B S is the union of all the Pi

H I 1 2 3 4 5 6 7 C *
• Partition
– The elements of each set of a partition
• are related by an equivalence relation
• equivalence relations are
– reflexive x~x
– transitive
if x ~ y and y ~ z, then x ~ z
– symmetric
if x ~ y, then y ~ x
– The sets of a partition are equivalence classes
• Each element of the set is related to every other element
H I 1 2 3 4 5 6 7 C *
• Partitions
– In the MST algorithm,
the connected vertices form equivalence classes
• “Being connected” is the equivalence relation
– Initially, each vertex is in a class by itself
– As edges are added,
more vertices become related
and the equivalence classes grow
– Until finally all the vertices are in a single equivalence class
H I 1 2 3 4 5 6 7 C *
• Representatives
– One vertex in each class may be chosen as the representative of
that class
– We arrange the vertices in lists that lead to the representative
• This is the union-find structure
• Cycle determination
H I 1 2 3 4 5 6 7 C *
• Cycle determination
– If two vertices have the same representative,
they’re already connected and adding a further
connection between them is pointless
– Procedure:
• For each end-point of the edge that you’re going to add
• follow the lists and find its representative
• if the two representatives are equal,
then the edge will form a cycle
H I 1 2 3 4 5 6 7 C *
Kruskal’s Algorithm in operation
All the vertices are in
single element trees
Each vertex is its

own representative
H I 1 2 3 4 5 6 7 C *
All the vertices are in
single element trees
Add it to the forest,

The cheapest edge joining h and g into a
is h-g 2-element tree
H I 1 2 3 4 5 6 7 C *
The cheapest edge
is h-g
Add it to the forest, Choose g as its

joining h and g into a representative
2-element tree
H I 1 2 3 4 5 6 7 C *
The next cheapest edge
is c-i
joining c and i into a
2-element tree
Choose c as its
representative
Our forest now has 2 two-element trees

and 5 single vertex ones
H I 1 2 3 4 5 6 7 C *
is a-b
joining a and b into a
2-element tree
Choose b as its
representative
Our forest now has 3 two-element trees

and 4 single vertex ones
H I 1 2 3 4 5 6 7 C *
is c-f
merging two
2-element trees
Choose the rep of one

as its representative
H I 1 2 3 4 5 6 7 C *
is g-i
The rep of g is c
The rep of i is also c
∴ g-i forms a cycle
It’s clearly not needed!
H I 1 2 3 4 5 6 7 C *
is c-d
The rep of c is c
The rep of d is d
∴ c-d joins two

trees, so we add it
.. and keep c as the representative
H I 1 2 3 4 5 6 7 C *
is h-i
The rep of h is c
The rep of i is c
∴ h-i forms a cycle,

so we skip it
H I 1 2 3 4 5 6 7 C *
is a-h
The rep of a is b
The rep of h is c
∴ a-h joins two trees,

and we add it
H I 1 2 3 4 5 6 7 C *
is b-c But b-c forms a cycle
So add d-e instead
... and we now have a spanning tree
H I 1 2 3 4 5 6 7 C *
Greedy Algorithms
• At no stage did we attempt to “look ahead”
• We simply made the naïve choice
– Choose the cheapest edge!
• MST is an example of a greedy algorithm
• Greedy algorithms
– Take the “best” choice at each step
– Don’t look ahead and try alternatives
– Don’t work in many situations
• Try playing chess with a greedy approach!
– Are often difficult to prove
• because of their naive approach
• what if we made this other (more expensive) choice now and later on ..... ???
H I 1 2 3 4 5 6 7 C *
Proving Greedy Algorithms
• MST Proof
– “Proof by contradiction” is usually the best approach!
– Note that
• any edge creating a cycle is not needed
∴ Each edge must join two sub-trees
– Suppose that the next cheapest edge, ex, would join trees Ta and Tb
– Suppose that instead of ex we choose ez - a more expensive edge, which
joins Ta and Tc
– But we still need to join Tb to Ta or some other tree to which Ta is
connected
– The cheapest way to do this is to add ex
– So we should have added ex instead of ez
– Proving that the greedy approach is correct for MST
H I 1 2 3 4 5 6 7 C *
MST - Time complexity
• Steps
– Initialise forest O( |V| )
– Sort edges O( |E|log|E| )
• Check edge for cycles O( |V| ) x
• Number of edges O( |V| ) O( |V|2 )
– Total O( |V|+|E|log|E|+|V|2 )
– Since |E| = O( |V|2 ) O( |V|2 log|V| )
– Thus we would class MST as O( n2 log n )

for a graph with n vertices
– This is an upper bound,
some improvements on this are known ...
• Prim’s Algorithm can be O( |E|+|V|log|V| )
using Fibonacci heaps
• even better variants are known for restricted cases,
such as sparse graphs ( |E| ≈ |V| )
H I 1 2 3 4 5 6 7 C *
MST - Time complexity
• Steps
– Initialise forest O( |V| )
– Sort edges O(Here’s
|E|log|E| )the
• Check edge for cycles“professionals
O( |V| ) x read textbooks”
• Number of edges O( |V| )theme
O( |V|recurring
2
) again!
– Total O( |V|+|E|log|E|+|V|2 )
– Since |E| = O( |V|2 ) O( |V|2 log|V| )
– Thus we would class MST as O( n2 log n )

for a graph with n vertices
– This is an upper bound,
some improvements on this are known ...
• Prim’s Algorithm can be O( |E|+|V|log|V| )
using Fibonacci heaps
• even better variants are known for restricted cases,
such as sparse graphs ( |E| ≈ |V| )
H I 1 2 3 4 5 6 7 C *
H I 1 2 3 4 5 6 7 C *
H I 1 2 3 4 5 6 7 C *
Thanking you
Good Luck

A6-R3 Data Structure - C

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

A6-R3 Data Structure - C

Caricato da

Copyright:

Formati disponibili

C

Advance Data Structures

void *FindinCollection( Collection c, void *key ) {

• A recursive implementation is also possible!

void *DeleteFromCollection( Collection c, void *key ) {

void *DeleteFromCollection( Collection c, void *key ) {

struct t_node { tail

typedef struct t_node *Node;

• Linked list also possible

head prev prev prev

• Check the open parenthesis.

1. Read A - operand : Push to Postfix

1. Read A - operand : Push to Postfix

Front rear Front rear

pop • Rear: Push the item by increasing rear

• Display options to pop

3 Pop from Front

4 Pop from Rear (End)

– Can lead to bad clustering

• or, in plain English,

is a uniform hash function

– All combinations are not generally equally likely

• The 3rd has only 363, and so on:

table is only 0.000

must be kept low

Order for nodes to be added

– Representation in Data Structure

• Each element of a binary tree is called as a node of the tree.

• The order of BT is two.

• A node that has no son are called as Leaf Node.

• A node N2 is left descendent of node N1 if N2 is either the left son of N1 or

• Moving from leaf node to root is called as climbing. Reverse is called as

• Tree structure can be logically viewed as Bottom up tree.

• A BT of a depth “d” is an Almost Complete Binary Tree (ACBT)

B.T S.B.T A.C.B.T A.C.B.T Not A.C.B.T

– B -> Right = Y – B -> Right = A c

– B -> Left = Y – B -> Left = A 100 c

AVL Tree AVL Tree

– Also, Sh = Sh-1 + Sh-2 + 1

– 1 and 4 are mirror images

– Case 4 is the mirror image rotation

– Case 3 is the mirror image rotation

typedef enum { LeftHeavy, Balanced, RightHeavy }

• In order: ADE G HKL M NOP T V

• We’ve performed a rotation of the sub-tree

– For both trees: the inorder traversal is

– Note that in this rotation, it was necessary to move

– An ordinary binary search tree with node colourings

Sentinel nodes (black)

When you examine

This implies that no path

The length of this path is the

Label the current node

While we haven’t reached the root

If x is to the left of it’s granparent

y is x’s right uncle

If the uncle is red, change

x’s parent is a left again,

.. but the uncle is black this time

.. but x’s parent is still red ...

.. The uncle is black ..

.. and x is to the left of its parent

.. So we have the final case ..

This is now a red-black tree ..

void FindinCollection( Collection c, void key ) {

void DeleteFromCollection( Collection c, void key ) {

void DeleteFromCollection( Collection c, void key ) {