Sei sulla pagina 1di 219

Unit I

LINEAR DATA STRUCTURES


Abstract Data Types- Stacks-Stack applications- Balancing symbols, Infix to postfix
expression conversion, Postfix Expression evaluation, Function calls- Queues- Linked lists-
Hash Tables - Direct-address tables, Hash tables, Hash functions - Open addressing.

2.2 STACK:
 A stack is a collection of data items that can be accessed at only one end, called top.
 Items can be inserted and deleted in a stack only at the top.
 The last item which was inserted in a stack will be the first one to be deleted.
 Therefore, a stack is called a Last-In-First-Out (LIFO) data structure.
There are two basic operations that are performed on stacks:
 PUSH- It is the process of inserting a new element on the top of a stack.
 POP- It is the process of deleting an element from the top of a stack.

Stacks can be implemented in two ways


i) Using an arrays
ii) Linked List.
Firstly it will discuss about the implementation of Stack using Array:

 A stack is similar to a list in which insertion and deletion is allowed only at one end.
 Therefore, similar to a list, stack can be implemented using both arrays and linked
lists.
 To implement a stack using an array:
 Consider size of the Stack is 5. i.e we can hold only 5 elements in the stack.

Declare an array:

int Stack[5]; // Maximum size needs to be specified in advance //

Declare a variable, top to hold the index of the topmost element in the stacks:

intint
top;
top; /* stack elements has to be referred using a variable TOP */

Initially, when the stack is empty, set:

top = –1 /* now top= -1 so stack is empty*/

PUSH:
Let us now write an algorithm for the PUSH operation.
Before we PUSH an element into the Stack we must ensure that whether the stack
contains overflow or not. (Overflow means the array is filled (size 5)). If yes, PUSH
operation cannot be performed.
 To avoid the stack overflow, you need to check for the stack full condition before pushing
an element into the stack.

Algorithm to PUSH:
If top = MAX – 1:
a. Display “Stack Full”
b. Exit
Increment top by 1.
Store the value to be pushed at index top in the array.
Top now contains the index of the topmost element.
Stack representation:

1
2

3 4

5 6

POP:
 The algorithm for POP operation is as follows:
1. Make a variable/pointer temp point to the topmost node.
2. Retrieve the value contained in the topmost node.
3. Make top point to the next node in sequence.
4. Release memory allocated to the node marked by temp.

Algorithm for POP operation:


1. If top = – 1:
a. Display “Stack Empty”
b. Exit
2. Retrieve the value stored at index top
3. Decrement top by 1

Now the below fig contains 5 elements. If you do POP the topmost element will be
removed from the stack. When all the elements are popped no more elements are there to
remove. That condition is called ‘underflow’ condition.

When you keep on removing the elements all the elements are removed now.
Now the stack is empty.

STACK USING ARRAY

/****** Program to Implement Stack using Array ******/


#include <stdio.h>
#include<conio.h>
#include<process.h>
#define MAX 6
void push();
void pop();
void display();
int stack[MAX], top=-1, item;
main()
{
int ch;
do
{
printf("\n\n\n\n1.\tPush\n2.\tPop\n3.\tDisplay\n4.\tExit\n");
printf("\nEnter your choice: ");
scanf("%d", &ch);
switch(ch)
{
case 1:
push();
break;
case 2:
pop();
break;
case 3:
display();
break;
case 4:
exit(0);
default:
printf("\n\nInvalid entry. Please try again...\n");
} } while(ch!=4);
getch();
}
void push(void)
{
if(top == MAX-1)
printf("\n\nStack is full.");
else
{
printf("\n\nEnter ITEM: ");
scanf("%d", &item);
top++;
stack[top] = item;
printf("\n\nITEM inserted = %d", item);
}
}
void pop(void)
{
if(top == -1)
printf("\n\nStack is empty.");
else
{
item = stack[top];
top--;
printf("\n\nITEM deleted = %d", item);
}
}
void display(void)
{
int i;
if(top == -1)
printf("\n\nStack is empty.");
else
{
for(i=top; i>=0; i--)
printf("\n%d", stack[i]);
}
}
STACK USING LINKED LIST

When a stack is implemented as a linked list, there is no upper bound limit on the
size of the stack. Therefore, there will be no stack full condition in this case.
A Linked list is a chain of structs or records called Nodes. Each node has at least
two members, one of which points to the next node in the list and the other holds the data.
These are defined as Single linked Lists because they can point to the next Node in the
list.
struct node
{
int data;
struct node *link; //to maintain the link other nodes
};
struct node *top=NULL,*temp;
We use above structure for a Node in our example. Variable Data holds the data in
the Node while pointer of the type struct node next holds the address to the next node in
the list. Top is a pointer of type struct node which acts as the top of the list.
Initially we set 'top' as NULL which means list is empty.

Procedure to create a list:


i) Create a new empty node temp.
ii) Read the stack element and store it in temp's data area.
iii) Assign temp's link part as NULL (i.e. top->link=NULL).
iv) Assign temp as top (i.e. top=temp).
/* create function create the head node */
void create( )
{
printf("\nENTER THE FIRST ELEMENT: ");
temp=(struct node *)malloc(sizeof(struct node));
scanf("%d",&temp->data);
temp->link=NULL;
top=temp;
}

Procedure to PUSH an element into the List:


a) Check Main memory for node creation.
b) Create a new node ‘temp’.
c) Read the stack element and store it in temp's data area.
d) Assign temp's link part as top (i.e. top->link=temp).
e) Assign temp as top (i.e. top=temp).
Syntax:
void push()
{
printf("\nENTER THE NEXT ELEMENT: ");
temp=(struct node *)malloc(sizeof(struct node));
scanf("%d",&temp->data);
temp->link=top;
top=temp;
}

Procedure to POP an element from the list:

a) If top is NULL then display stack is empty.

b) Otherwise assign top as temp (i.e. temp=top, bring the top to top position)

c) Assign top as top's link. (i.e. top=top->link, bring the top to top's next position).

d) Delete temp from memory.

Syntax:

void pop()
{
if(top==NULL)
{
printf("\nSTACK IS EMPTY\n");
}
else
{
temp=top;
printf("\nDELETED ELEMENT IS %d\n",top->data);
top=top->link;
free(temp);
}
}

Traversing the List


If it is traverse (visiting all the nodes), then process the following steps
a) Bring the top to stack’s top position(i.e. top=temp)
b) Repeat until top becomes NULL
i) Display the top's data.
ii) Assign top as top's link (top=top->link).
Syntax:

/* display function visit the linked list from top to end */

void display()
{
top=temp; // bring the top to top position
printf("\n");
while(top!=NULL)
{
printf("%d\n",top->data);
top=top->link; // Now top points the previous node in the list
}
}

IMPLEMENTATION OF STACK USING LINKED LIST

Program
#include<stdio.h>
#include<conio.h>
#include<alloc.h>
#include<stdlib.h>

/* Node decleration */
struct node
{
int data;
struct node *link; //to maintain the link other nodes
};
struct node *top,*temp;

void create();
void push();
void pop();
void display();
void main()
{
int ch;
clrscr();
while(1)
{
printf("\n\n 1.CREATE \n 2.PUSH \n 3.POP \n 4.EXIT \n");
printf("\n ENTER YOUR CHOICE : ");
scanf("%d",&ch);
switch(ch)
{
case 1:
create();
display();
break;
case 2:
push();
display();
break;
case 3:
pop();
display();
break;
case 4:
exit(0);
}
}
}

/* create function create the head node */


void create()
{
printf("\nENTER THE FIRST ELEMENT: ");
temp=(struct node *)malloc(sizeof(struct node));
scanf("%d",&temp->data);
temp->link=NULL;
top=temp;
}
void push()
{
printf("\nENTER THE NEXT ELEMENT: ");
temp=(struct node *)malloc(sizeof(struct node));
scanf("%d",&temp->data);
temp->link=top;
top=temp;
}

void pop()
{
if(top==NULL)
{
printf("\nSTACK IS EMPTY\n");
}
else
{
temp=top;
printf("\nDELETED ELEMENT IS %d\n",top->data);
top=top->link;
free(temp);
}
}

/* display function visit the linked list from top to end */


void display()
{
temp=top; // bring the top to top position
printf("\n");
while(temp!=NULL)
{
printf("%d\n",temp->data);
temp=temp->link; // Now temp points the previous node in the list
}
}

SAMPLE INPUT AND OUTPUT:


STACK
1. CREATE
2. PUSH
3. POP
4. EXIT
ENTER YOUR CHOICE : 1
ENTER THE FIRST ELEMENT : 10
10
STACK
1. CREATE
2. PUSH
3. POP
4. EXIT
ENTER YOUR CHOICE: 2
ENTER THE NEXT ELEMENT: 30
10
30
STACK
1. CREATE
2. PUSH
3. POP
4. EXIT
ENTER YOUR CHOICE: 3
DELETED ELEMENT IS 30
STACK
1. CREATE
2. PUSH
3. POP
4. EXIT
ENTER YOUR CHOICE: 3
DELETED ELEMENT IS 10
STACK
1. CREATE
2. PUSH
3. POP
4. EXIT
ENTER YOUR CHOICE: 3
STACK IS EMPTY.

2.3 STACK APPLICATIONS:

Some of the applications of stacks are:


1. Balancing Symbols
2. Function Calls
3. Postfix Evaluation
4. Infix to Postfix Conversion

Balancing Symbols

Compilers check your programs for syntax errors, but frequently a lack of one
symbol (such as a missing brace or comment starter) will cause the compiler to spill out a
hundred lines of diagnostics without identifying the real error.

A useful tool in this situation is a program that checks whether everything is


balanced. Thus, every right brace, bracket, and parenthesis must correspond to their left
counterparts. The sequence [()] is legal, but [(]) is wrong.

Function Calls

The algorithm to check balanced symbols suggests a way to implement function


calls. The problem here is that when a call is made to a new function, all the variables
local to the calling routine need to be saved by the system, since otherwise the new
function will overwrite the calling routine's variables.

Furthermore, the current location in the routine must be saved so that the new
function knows where to go after it is done.

When there is a function call, all the important information that needs to be saved,
such as register values (corresponding to variable names) and the return address (which
can be obtained from the program counter, which is typically in a register), is saved "on a
piece of paper" in an abstract way and put at the top of a pile. Then the control is
transferred to the new function, which is free to replace the registers with its values.
When the function wants to return, it looks at the "paper" at the top of the pile and
restores all the registers. It then makes the return jump.

2.4 Balancing symbols, Infix to postfix expression conversion, Postfix Expression


Evaluation & Function Calls

Postfix Evaluation:

The simple algorithm uses a stack and is as follows:

Make an empty stack. Read characters until end of file. If the character is an open
anything, push it onto the stack. If it is a close anything, then if the stack is empty report
an error. Otherwise, pop the stack. If the symbol popped is not the corresponding opening
symbol, then report an error. At end of file, if the stack is not empty report an error.

6523+8*+3+*

is evaluated as follows: The first four symbols are placed on the stack. The resulting stack
is

Next a '+' is read, so 3 and 2 are popped from the stack and their sum, 5, is pushed.

Next 8 is pushed.
Now a '*' is seen, so 8 and 5 are popped as 8 * 5 = 40 is pushed.

Next a '+' is seen, so 40 and 5 are popped and 40 + 5 = 45 is pushed.

Now, 3 is pushed.

Next '+' pops 3 and 45 and pushes 45 + 3 = 48.


Finally, a '*' is seen and 48 and 6 are popped, the result 6 * 48 = 288 is pushed .

When an expression is given in postfix notation, there is no need to know any


precedence rules; this is an obvious advantage

INFIX to POSTFIX CONVERSION

 Not only can a stack be used to evaluate a postfix expression, but we can also use a
stack to convert an expression in standard form (otherwise known as infix) into
postfix.
 We will concentrate on a small version of the general problem by allowing only the
operators +, *, and (, ), and insisting on the usual precedence rules.

 Suppose we want to convert the infix expression

a+b*c+(d*e+f)*g

into postfix. A correct answer is a b c * + d e * f + g * +.

Rules:

 When an operand is read, it is immediately placed onto the output.


 Operators are not immediately output, so they must be saved somewhere. The correct
thing to do is to place operators that have been seen, but not placed on the output,
onto the stack. We will also stack left parentheses when they are encountered.
 We start with an initially empty stack.

 If we see a right parenthesis, ’ )’ then we pop the stack, writing symbols until we
encounter a (corresponding) left parenthesis, ‘ (’ which is popped but not into the
output.

 If we see any other symbol { '+' , '*' , '(' }, then we pop entries from the stack until
we find an entry of lower priority. One exception is that we never remove a '(' from
the stack except when processing a ')'. For the purposes of this operation, '+' has
lowest priority and '(' highest. When the popping is done, we push the operand onto
the stack.

 Finally, if we read the end of input, we pop the stack until it is empty, writing symbols
onto the output.

 To see how this algorithm performs, we will convert the infix expression above into
its postfix form. First, the symbol ‘a’ is read, so it is passed through to the output.
Then '+' is read and pushed onto the stack. Next ‘b’ is read and passed through to the
output. The state of affairs at this juncture is as follows:

 Next a '*' is read. The top entry on the operator stack has lower precedence than '*', so
nothing is output and '*' is put on the stack. Next, ‘c’ is read and output. Thus far, we
have
 The next symbol is a '+'. Checking the stack, we find that we will pop a '*' and place
it on the output, pop the other '+', which is not of lower but equal priority, on the
stack, and then push the '+'.

 The next symbol read is an '(', which, being of highest precedence, is placed on the
stack. Then d is read and output.

 We continue by reading a '*'. Since open parentheses do not get removed except when
a closed parenthesis is being processed, there is no output. Next, e is read and output.
 The next symbol read is a '+'. We pop and output '*' and then push '+'. Then we read
and output

 We read a '*' next; it is pushed onto the stack. Then ‘g’ is read and output.

 Now we read a ')', so the stack is emptied back to the '('. We output a '+'.

 The input is now empty, so we pop and output symbols from the stack until it is empty
Finally the output is: a bc*+de*f+g*+

Note: We can add subtraction and division to this repertoire by assigning subtraction and
addition equal priority and multiplication and division equal priority. A subtle point is that
the expression a - b - c will be converted to ab - c- and not abc - -. Our algorithm does
the right thing, because these operators associate from left to right.

2.5 QUEUE
Introduction:

 Consider a situation where you have to create an application with the following set of
requirements:
 Application should serve the requests of multiple users.
 At a time, only one request can be processed.
 The request, which came first should be given priority.
 However, the rate at which the requests are received is much faster than the rate at
which they are processed.
 Therefore, you need to store the request somewhere until they are processed.
 How can you solve this problem?
 You can solve this problem by storing the requests in such a manner so that
they are retrieved in the order of their arrival.
 A data structure called queue stores and retrieves data in the order of its
arrival.
 A queue is also called a FIFO (First In First Out) list.
 Queue is a list of elements in which an element is inserted at one end and deleted
from the other end of the queue.
Elements are inserted at one end called REAR end and deleted at the other end
called FRONT end.
 Various operations are implemented on a queue. But the most important are:
 Enqueue which is called inserting an element
 Dequeue which is called deleting an element.

Enqueue(Insert):
It refers to the addition of an item in the queue.
 Suppose you want to add an item F in the following queue.
 Since the items are inserted at the rear end, therefore, F is inserted after D.
 Now F becomes the rear end.

Now the REAR will move to the next position to point out the newly inserted
element F. after insertion, the REAR points out F.
Dequeue(Delete):

It refers to the deletion of an item from the queue.


 Since the items are deleted from the front end, therefore, item B is removed
from the queue.
 Now A becomes the front end of the queue.

In this Dequeue operation the FRONT end element will be removed. So the
element B is removed and the value of FRONT is incremented to point out the next
element A. now in this new Queue, FRONT points out A.

IMPLEMENTING A QUEUE USING AN ARRAY:


Let us implement a Queue using an array that stores the elements in the order of
their arrival.
 To keep track of the rear and front positions, you need to declare two integer
variables, REAR and FRONT.
 If the queue is empty, REAR and FRONT are set to –1.

ENQUEUE:

 To insert an element, you need to perform the following steps:


 Increment the value of REAR by 1.
 Insert the element at index position REAR in the array.

Algorithm to insert (enqueue) an element in a queue:

1. If the queue is empty:


a. Set FRONT = 0.
2. Increment REAR by 1.
3. Store the element at index position REAR in the array.

When the stack is empty both FRONT = -1 & REAR = -1

If the first element is inserted, FRONT = 0 & REAR = 0.


When the next elements are inserted only the REAR will be inserted.

Subsequent elements will also be inserted. When the last element is inserted
REAR becomes 4.

After inserting all the elements in a queue using an array, no more elements can be
inserted, because in array the size is fixed (static).

DEQUEUE(delete):

 While insertion is performed in REAR end, Deletion will be done at the FRONT end.
 Algorithm to delete an element for the QUEUE;
1. Retrieve the element at index FRONT.
2. Increment FRONT by 1.

Let us see how elements are deleted from the queue once they get processed.
When the dequeue is performed, FRONT will be incremented.

To implement an insert or delete operation, you need to increment the values of


REAR or FRONT by 1 respectively.
However these values are never decremented.
 As you delete elements from the queue, the queue moves down the array.
 The disadvantage of this approach is that the storage space in the beginning is
discarded and never used again.
Consider the next fig.
 REAR is at the last index position.
 Therefore, you cannot insert elements in this queue, even though there is space for
them.
 This means that all the initial vacant positions go waste.
If you implement a queue in the form of a linear array, you can add elements only
in the successive index positions. However, when you reach the end of the queue, you
cannot start inserting elements from the beginning, even if there is space for them at the
beginning. You can overcome this disadvantage by implementing a queue in the form of a
circular array. In this case, you can keep inserting elements till all the index positions are
filled. Hence, it solves the problem of unutilized space.

 In this approach(circular array), if REAR is at the last index position and if there is
space in the beginning of an array, then you can set the value of REAR to zero and
start inserting elements from the beginning.(we are not touching this topic here).

 What is the disadvantage of implementing a queue as an array?


 To implement a queue using an array, you must know the maximum number of
elements in the queue in advance.
 To solve this problem, you should implement the queue in the form of a linked list.
QUEUE USING ARRAY

/****** Program to Implement Queue using Array ******/

#include <stdio.h>
#include<conio.h>
#include<stdlib.h>
#include<process.h>
#define MAX 6
void insert();
void remove();
void display();
int queue[MAX], rear=-1, front=-1, item;
main()
{
int ch;
do
{
printf("\n\n1. Insert\n2. Delete\n3. Display\n4. Exit\n");
printf("\nEnter your choice: ");
scanf("%d", &ch);

switch(ch)
{
case 1:
insert();
break;
case 2:
remove();
break;
case 3:
display();
break;

case 4:
exit(0);

default:
printf("\n\nInvalid entry. Please try again...\n");
}
} while(ch<=4);
getch();
}

void insert()
{
if(rear == MAX-1)
printf("\n\nQueue is full.");
else
{
printf("\n\nEnter ITEM: ");
scanf("%d", &item);

if (rear == -1 && front == -1)


{
rear = 0;
front = 0;
}
else
rear++;
queue[rear] = item;
printf("\n\nItem inserted: %d", item);
}
}
void remove()
{
if(front == -1)
printf("\n\nQueue is empty.");
else
{
item = queue[front];

if (front == rear)
{
front = -1;
rear = -1;
}
else
front++;

printf("\n\nItem deleted: %d", item);


}
}

void display()
{
int i;

if(front == -1)
printf("\n\nQueue is empty.");
else
{
printf("\n\n");

for(i=front; i<=rear; i++)


printf(" %d", queue[i]);
}
}

*******

IMPLEMENTATION OF QUEUE USING LINKED LIST


 To keep track of the rear and front positions, you need to declare two
variables/pointers, REAR and FRONT, that will always point to the rear and front end
of the queue respectively.
 If the queue is empty, REAR and FRONT point to NULL.

Algorithm to insert an element in QUEUE:

1. Allocate memory for the new node. (Using malloc)


2. Assign value to the data field of the new node.

3. Make the next field of the new node point to NULL.

4. If the queue is empty, execute the following steps:

a. Make FRONT point to the new node

b. Make REAR point to the new node

c. Exit

5. Make the next field of REAR point to the new node.

6. Make REAR point to the new node.

Once the node is created using malloc value is assigned in the data field and the
next filed is pointing to NULL.
When the next element is inserted, the link will be created by changing the address
field of the previous node to the newly created node.

DELETION:

Algorithm to DELETE an element in a QUEUE:

1. If the queue is empty: // FRONT = NULL


a. Display “Queue empty”

b. Exit

c. Mark the node marked FRONT as current

2. Make FRONT point to the next node in its sequence

3. Release the memory for the node marked as current


 Mark the node marked FRONT as current

 Make FRONT point to the next node in its sequence

 Release the memory for the node marked as current

In a linked list, you can insert and delete elements anywhere in the list. However,
in a linked queue, insertion and deletion takes place only from the ends. More
specifically, insertion takes place at the rear end and deletion takes place at the front end
of the queue.

 Queues offer a lot of practical applications. Some of them are:

 Printer Spooling

 CPU Scheduling

 Mail Service

 Keyboard Buffering

Printer Spooling

 A printer may receive multiple print requests in a short span of time.


 The rate at which these requests are received is much faster than the rate at which they
are processed.

 Therefore, a temporary storage mechanism is required to store these requests in the


order of their arrival.

 A queue is the best choice in this case, which stores the print requests in such a
manner so that they are processed on a first-come-first-served basis.

CPU Scheduling

 A CPU can process one request at a time.


 The rate at which the CPU receives requests is usually much greater than the rate at
which the CPU processes the requests.

 Therefore, the requests are temporarily stored in a queue in the order of their arrival.
Whenever CPU becomes free, it obtains the requests from the queue.

 Once a request is processed, its reference is deleted from the queue. The CPU then
obtains the next request in sequence and the process continues.
 In a time sharing system, CPU is allocated to each request for a fixed time period.

 All these requests are temporarily stored in a queue.

 CPU processes each request one by one for a fixed time period.

 If the request is processed within that time period, its reference is deleted from the
queue.

 If the request is not processed within that specified time period, the request is shifted
to the end of the queue.

 CPU then processes the next request in the queue and the process continues.

Mail Service

 In various organizations, many transactions are conducted through mails.


 If the mail server goes down, and someone sends you a mail, the mail is bounced back
to the sender.

 To avoid any such situation, many organizations implement a mail backup service.

 Whenever there is some problem with the mail server because of which the messages
are not delivered, the mail is routed to the mail’s backup server.

 The backup server stores the mails temporarily in a queue.

 Whenever the mail server is up, all the mails are transferred to the recipient in the
order in which they arrived.

Keyboard Buffering

 Queues are used for storing the keystrokes as you type through the keyboard.
 Sometimes the data, which you type through the keyboard is not immediately
displayed on the screen.
 This is because during that time, the processor might be busy doing some other task.

 In this situation, the data is temporarily stored in a queue, till the processor reads it.

 Once the processor is free, all the keystrokes are read in the sequence of their arrival
and displayed on the screen.

2.6 LINKED LIST

 We cannot use an array to store a set of elements if you do not know the total number
of elements in advance. By having some way in which you can allocate memory as
and when it is required.
 When you declare an array, a contiguous block of memory is allocated.

 If you know the address of the first element in the array, you can calculate the address
of the next elements.

Linked list:

 Linked list is a dynamic data structure which Allows memory to be allocated as and
when it is required.
 Consists of a chain of elements, in which each element is referred to as a node.

 A node is the basic building block of a linked list.

 A node consists of two parts:

 Data: Refers to the information held by the node

 Link: Holds the address of the next node in the list


 All the nodes in a linked list are present at arbitrary memory locations. Therefore
every node in a linked list that stores the address of the next node in sequence.
 The last node in a linked list does not point to any other node. Therefore, it points to
NULL.

 To keep track of the first node, declare a variable/pointer, START, which always
points to the first node.
 When the list is empty, START contains null.

Algorithm to insert a node in a linked list.

1. Allocate memory for the new node.


2. Assign value to the data field of the new node.

3. If START is NULL, then:

a. Make START point to the new node.

b. Go to step 6.

4. Locate the last node in the list, and mark it as current Node. To locate the last node
in the list, execute the following steps:

a. Mark the first node as current Node.


b. Repeat step c until the successor of current Node becomes NULL.

c. Make current Node point to the next node in sequence.

5. Make the next field of current Node point to the new node.

6. Make the next field of the new node point to NULL.

Consider that the list is initially.

START = NULL

***Implementation of Linked List ***

#include <stdio.h>
#include<conio.h>
#include<stdlib.h>
#include<alloc.h>
void insert_first();
void display();
void insert_last();
void insert_specific();
void delete_last();
void delete_first();
void delete_specific();
struct node
{
int data;
struct node *next;
} *start=NULL;

int item;
void main()
{
int ch;
clrscr();
do
{
printf("\n\n\n1. Insert First\n2. insert last\n3. insert specific\n4. Delete first\n 5.
Delete last\n 6. Delete specific\n 7.Exit\n\n\n");
printf("\nEnter your choice: ");
scanf("%d",&ch);

switch(ch)
{
case 1:
insert_first();
display();
break;

case 2:
insert_last();
display();
break;

case 3:
insert_specific();
display();
break;
case 4:
delete_first();
display();
break;

case 5:
delete_last();
display();
break;

case 6:
delete_specific();
display();
break;

case 7:
exit(0);

default:
printf("\n\nInvalid choice. Please try again.\n");
}
} while (1);
}

void insert_first()
{
struct node *ptr;

printf("\n\nEnter item: ");


scanf("%d", &item);

if(start == NULL)
{
start = (struct node *)malloc(sizeof(struct node));
start->data = item;
start->next = NULL;
}
else
{
ptr= start;
start = (struct node *)malloc(sizeof(struct node));
start->data = item;
start->next = ptr;
}

printf("\nItem inserted: %d\n", item);


}

void insert_last()
{
struct node *ptr;
printf("\n\nEnter item: ");
scanf("%d", &item);

if(start == NULL)
{
start = (struct node *)malloc(sizeof(struct node));
start->data = item;
start->next = NULL;
}
else
{
ptr = start;
while (ptr->next != NULL)
ptr = ptr-> next;

ptr-> next = (struct node *)malloc(sizeof(struct node));


ptr = ptr-> next;
ptr-> data = item;
ptr-> next = NULL;
}

printf("\nItem inserted: %d\n", item);


}

void insert_specific()
{
int n;
struct node *nw, *ptr;

if (start == NULL)
printf("\n\nNexted list is empty. It must have at least one node.\n");
else
{
printf("\n\nEnter DATA after which new node is to be inserted: ");
scanf("%d", &n);
printf("\n\nEnter ITEM: ");
scanf("%d", &item);

ptr = start;
nw = start;

while (ptr != NULL)


{
if (ptr->data == n)
{
nw = (struct node *)malloc(sizeof(struct node));
nw->data = item;
nw->next = ptr->next;
ptr->next = nw;
printf("\n\nItem inserted: %d", item);
return;
}
else
ptr = ptr->next;
}
}
}

void display()
{
struct node *ptr = start;
int i=1;

if (ptr == NULL)
printf("\nNextlist is empty.\n");
else
{
printf("**********************************");
printf("\nSr. No.\t\tAddress\t\tData\t\tNext\n");
while(ptr != NULL)
{
printf("\n%d. \t\t%d \t\t%d \t\t%d \n", i, ptr, ptr->data, ptr->next);
ptr = ptr->next;
i++;
}
printf("*********************************");
}
}

void delete_first()
{
struct node *ptr;

if (start == NULL)
printf("\n\nNexted list is empty.\n");
else
{
ptr = start;
item = start->data;
start = start->next;
free(ptr);
printf("\n\nItem deleted: %d", item);
}
}

void delete_last()
{
struct node *ptr, *prev;

if (start == NULL)
printf("\n\nNexted list is empty.\n");
else
{
ptr = start;
prev = start;
while (ptr->next != NULL)
{
prev = ptr;
ptr = ptr->next;
}

item = ptr->data;

if (start->next == NULL)
start = NULL;
else
prev->next = NULL;

prev->next = NULL;
free(ptr);
printf("\n\nItem deleted: %d", item);
}
}

void delete_specific()
{
struct node *ptr, *prev;

printf("\n\nEnter ITEM which is to be deleted: ");


scanf("%d", &item);

if (start == NULL)
printf("\n\nNexted list is empty.\n");
else if (start->data == item)
{
ptr = start;
start = start->next;
free(ptr);
}
else
{
ptr = start;
prev = start;

while (ptr != NULL)


{
if (ptr->data == item)
{
prev->next = ptr->next;
free(ptr);
}
else
{
prev = ptr;
ptr = ptr->next;
}
}
printf("\n\nItem deleted: %d", item);
}
}

*******
EXERCISES
PART A
1. Define Data Structures
2. Define primary data structures
3. Define static data structures
4. List some of the static data structures in C
5. Define dynamic data structures
6. List some of the dynamic data structures in C
7. Define linear data structures
8. Define non-linear data structures
9. Define Linked Lists
10. State the different types of linked lists
11. List the basic operations carried out in a linked list
12. List out the advantages of using a linked list
13. List out the disadvantages of using a linked list
14. List out the applications of a linked list
15. State the difference between arrays and linked lists
16. Define a stack
17. List out the basic operations that can be performed on a stack
18. State the different ways of representing expressions
19. State the advantages of using infix notations
20. State the advantages of using postfix notations
21. State the rules to be followed during infix to postfix conversions
22. State the rules to be followed during infix to prefix conversions
23. State the difference between stacks and linked lists
24. Mention the advantages of representing stacks using linked lists than arrays
25. Define a queue
26. Define a priority queue
27. State the difference between queues and linked lists
28. Define a Dequeue
29. Why you need a data structure?
30. Difference between Abstract Data Type, Data Type and Data Structure
31. Define data type and what are the types of data type?
32. Define an Abstract Data Type (ADT)
33. What are the advantages of modularity?
34. State the difference between primitive and non-primitive data types
35. State the difference between persistent and ephemeral data structure
36. What are the objectives of studying data structures?
37. What are the types of queues?
38. List the applications of stacks
39. List the applications of queues
40. Define Hashing.
41. What do you mean by hash table?
42. What do you mean by hash function?
43. Write the importance of hashing.
44. What do you mean by collision in hashing?
45. What are the collision resolution methods?
46. What do you mean by separate chaining?
47. Write the advantage of separate chaining.
48. Write the disadvantages of separate chaining.
49. What do you mean by open addressing?
50. What are the types of collision resolution strategies in open addressing?
51. What do you mean by Probing?
52. What do you mean by linear probing?
53. What do you mean by primary clustering?
54. What do you mean by quadratic probing?
55. What do you mean by secondary clustering?
56. What do you mean by double hashing?
57. What do you mean by rehashing?
58. What is the need for extendible hashing?
59. List the limitations of linear probing.
60. Mention one advantage and disadvantage of using quadratic probing.
61. Why we need cursor implementation of linked lists?
PART – B

1. What is a Stack? Explain with example?


2. Write the algorithm for converting infix expression to postfix expression?
3. What is a Queue? Explain its operation with example?
4. Explain the applications of stack?
5. Write an algorithm for inserting and deleting an element from doubly linked list?
6. Explain linear linked implementation of Stack and Queue?
7. What is an Abstract Data type? And explain
8.Define Structure. Explain in detail.
9. What is Union? Explain in detail
10. Define recursion. Explain with it Fibonacci series
11. Explain allocation of storage variable and scope variables.
12. Explain hashing with example.
13. Explain collision resolution strategies?
14. Explain extendible hashing?
UNIT II
TREES
Tree Terminologies - Binary tree - Binary tree traversal - Expression tree construction- Binary
Search Trees- Querying a binary search tree, Insertion and deletion–AVL trees-rotations,
insertion. B-Trees-Definition of-trees- Basic operations on B-trees- insertion and deletion.
Priority Queues (Heaps) – Model – Simple implementations – Binary Heap-Properties.

3.1 Tree Terminologies

 Consider a scenario where you are required to represent the directory structure of your
operating system.
 The directory structure contains various folders and files. A folder may further contain
any number of sub folders and files.
 In such a case, it is not possible to represent the structure linearly because all the
items have a hierarchical relationship among themselves.
 In such a case, it would be good if you have a data structure that enables you to store
your data in a nonlinear fashion.

DEFINITION:

 A tree is a nonlinear data structure that represent a hierarchical relationship among the
various data elements
 Trees are used in applications in which the relation between data elements needs to be
represented in a hierarchy.
 Each element in a tree is referred to as a node.
 The topmost node in a tree is called root.

 Each node in a tree can further have subtrees below its hierarchy.
 Let us discuss various terms that are most frequently used with trees.
 Leaf node: It refers to a node with no children.
 Nodes E, F, G, H, I, J, L, and M are leaf nodes.
 Subtree: A portion of a tree, which can be viewed as a separate tree in itself is called a
subtree.
 A subtree can also contain just one node called the leaf node.
Tree with root B, containing nodes E, F, G, and H is a subtree of node A.
 Children of a node: The roots of the subtrees of a node are called the children of the
node.
o F, G, and H are children of node B. B is the parent of these nodes.
 Degree of a node: It refers to the number of subtrees of a node in a tree.
Degree of node C is 1
Degree of node D is 2
Degree of node A is 3
Degree of node B is 4
 Edge: A link from the parent to a child node is referred to as an edge.
 Siblings/Brothers: It refers to the children of the same node.
Nodes B, C, and D are siblings of each other.
Nodes E, F, G, and H are siblings of each other.
 Level of a node: It refers to the distance (in number of nodes) of a node from the root.
Root always lies at level 0.
 As you move down the tree, the level increases by one.

 Depth of a tree: Refers to the total number of levels in the tree.


 The depth of the following tree is 4.
 Internal node: It refers to any node between the root and a leaf node.
Nodes B, C, D, and K are internal nodes.

Example:

 Consider the above tree and answer the questions that follow:
a. What is the depth of the tree?
b. Which nodes are children of node B?
c. Which node is the parent of node F?
d. What is the level of node E?
e. Which nodes are the siblings of node H?
f. Which nodes are the siblings of node D?
g. Which nodes are leaf nodes?
 Answer:
a. 4
b. D and E
c. C
d. 2
e. H does not have any siblings
f. The only sibling of D is E
g. F, G, H, and I

3.2 Binary tree:


Binary tree is a tree where each node has exactly zero or two children. In a binary
tree, a node cannot have more than two children. In a binary tree, children are named as
“left” and “right” children. The child nodes contain a reference to their parent.
 There are various types of binary trees, the most important are:
 Full binary tree
 Complete binary tree

A full binary tree is a tree in which every node in the tree has two children except
the leaves of the tree.
A complete binary tree is a binary tree in which every level of the binary tree is
completely filled except the last level. In the unfilled level, the nodes are attached starting
from the left-most position.

What is Full Binary Tree?


Full binary tree is a binary tree in which every node in the tree has exactly zero or
two children. In other words, every node in the tree except the leaves has exactly two
children. Figure 1 below depicts a full binary tree. In a full binary tree, the number of
nodes (n), number of laves (l) and the number of internal nodes (i) is related in a special
way such that if you know any one of them you can determine the other two values as
follows:
1. If a full binary tree has i internal nodes:
– Number of leaves l = i+1
– Total number of nodes n = 2*i+1
2. If a full binary tree has n nodes:
– Number of internal nodes i = (n-1)/2
– Number of leaves l=(n+1)/2
3. If a full binary tree has l leaves:
– Total Number of nodes n=2*l-1
– Number of internal nodes i = l-1
What is Complete Binary Tree?
As shown in figure 2, a complete binary tree is a binary tree in which every level
of the tree is completely filled except the last level. Also, in the last level, nodes should
be attached starting from the left-most position. A complete binary tree of height h
satisfies the following conditions:

– From the root node, the level above last level represents a full binary tree of height h-1
– One or more nodes in last level may have 0 or 1 children
– If a, b are two nodes in the level above the last level, then a has more children than b if
and only if a is situated left of b

What is the difference between Complete Binary Tree and Full Binary Tree?
Complete binary trees and full binary trees have a clear difference. While a full
binary tree is a binary tree in which every node has zero or two children, a complete
binary tree is a binary tree in which every level of the binary tree is completely filled
except the last level. Some special data structures like heaps need to be complete binary
trees while they don’t need to be full binary trees. In a full binary tree, if you know the
number of total nodes or the number of laves or the number of internal nodes, you can
find the other two very easily. But a complete binary tree does not have a special property
relating theses three attributes.
Binary tree can be represented by Array and linked list.

Array representation of a binary tree:

 All the nodes are represented as the elements of an array.

 If there are n nodes in a binary tree, then for any node with index i, where 0 < i < n – 1:
o Parent of i is at (i – 1)/2.
o Left child of i is at 2i + 1:
 If 2i + 1 > n – 1, then the node does not have a left child.
o Right child of i is at 2i + 2:
 If 2i + 2 > n – 1, then the node does have a right child.
Linked representation of a binary tree:

 It uses a linked list to implement a binary tree.


 Each node in the linked representation holds the following information:
 Data
 Reference to the left child
 Reference to the right child
 If a node does not have a left child or a right child or both, the respective left or right
child fields of that node point to NULL.

3.3 Binary Tree Traversal:

 You can implement various operations on a binary tree.


 A common operation on a binary tree is traversal.
 Traversal refers to the process of visiting all the nodes of a binary tree once.
 There are three ways for traversing a binary tree:
 Inorder traversal
 Preorder traversal
 Postorder traversal
InOrder Traversal:

In this traversal, the tree is visited starting from the root node. At a
particular node, the traversal is continued with its left node, recursively, until no
further left node is found. Then the data at the current node (the left most node in the
sub tree) is visited, and the procedure shifts to the right of the current node, and the
procedure is continued. This can be explained as:
1. Traverse the left subtree
2. Visit root
3. Traverse the right subtree (Left Data Right)

 Let us consider an example.

 The left subtree of node A is not NULL.  The left subtree of node B is not NULL.
 Therefore, move to node B to traverse the left  Therefore, move to node D to traverse the
subtree of A. left subtree of B.

2
1

 Left subtree of H is empty.


 The left subtree of node D is NULL.
 Therefore, visit node H.
 Therefore, visit node D.
3
4

 Right subtree of H is empty.


 The left subtree of B has been visited.
 Therefore, move to node B.
 Therefore, visit node B.

5 6

 Right subtree of B is not empty.


 Left subtree of E is empty.
 Therefore, move to the right subtree of B
 Therefore, visit node E.
7 8
 Right subtree of E is empty.  Left subtree of A has been visited.
 Therefore, move to node A.  Therefore, visit node A.

9 10
 Right subtree of A is not empty.  Left subtree of C is not empty.
 Therefore, move to the right subtree of A.  Therefore, move to the left subtree of C.
11 12
 Left subtree of F is empty.  Right subtree of F is empty.
 Therefore, visit node F.  Therefore, move to node C.

1
13
4
 The left subtree of node C has been visited.  Right subtree of C is not empty.
 Therefore, visit node C.  Therefore, move to the right subtree of
node C.
16
15

 Left subtree of G is not empty.  Left subtree of I is empty.


 Therefore, move to the left subtree of node G.  Therefore, visit I.

17 18
 Visit node G.
 Right subtree of I is empty.
 Therefore, move to node G.

20
19

 Right subtree of G is empty.

19

Preorder Traversal

In this traversal, the tree is visited starting from the root node. At a particular
node, the data is read (visited), then the traversal continues with its left node,
recursively, until no further left node is found. Then the right node of the recent left node
is set as the current node, and the procedure is continued. This can be explained as:
1. Visit root
2. Traverse the left subtree
3. Traverse the right subtree

PostOrder Traversal:

In this traversal, the tree is visited starting from the root node. At a particular node,
the traversal is continued with its left node, recursively, until no further left node is
found. Then the right node of the current node (the left most node in the sub tree) is
visited, and the procedure shifts to the right of the current node, and the
procedure is continued. This can be explained as:

1. Traverse the left subtree


2. Traverse the right subtree
3. Visit the root
3.4 Expression Tree Construction:

The leaves of a binary expression tree are operands, such as constants or variable names, and the
other nodes contain operators. These particular trees happen to be binary, because all of the
operations are binary, and although this is the simplest case, it is possible for nodes to have more
than two children. It is also possible for a node to have only one child, as is the case with the
unary minus operator. An expression tree, T, can be evaluated by applying the operator at the
root to the values obtained by recursively evaluating the left and right subtrees

An algebraic expression can be produced from a binary expression tree by recursively producing
a parenthesized left expression, then printing out the operator at the root, and finally recursively
producing a parenthesized right expression. This general strategy (left, node, right) is known as
an in-order travesal. An alternate traversal strategy is to recursively print out the left subtree, the
right subtree, and then the operator. This traversal strategy is generally known as post-order
traversal. A third strategy is to print out the operator first and then recursively print out the left
and right subtree.

These three standard depth-first traversals are representations of the three different expression
formats: infix, postfix, and prefix. An infix expression is produced by the inorder traversal, a
postfix expression is produced by the post-order traversal, and a prefix expression is produced by
the pre-order traversal.When an infix expression is printed, an opening and closing parenthesis
must be added at the beginning and ending of each expression. As every subtree represents a
subexpression, an opening parenthesis is printed at its start and the closing parenthesis is printed
after processing all of its children.

Pseudocode:

Algorithm infix (tree)


/*Print the infix expression for an expression tree.
Pre : tree is a pointer to an expression tree
Post: the infix expression has been printed*/
if (tree not empty)
if (tree token is operator)
print (open parenthesis)
end if
infix (tree left subtree)
print (tree token)
infix (tree right subtree)
if (tree token is operator)
print (close parenthesis)
end if
end if
end infix

Postfix Traversal

The postfix expression is formed by the basic postorder traversal of any binary tree. It does not
require parentheses.

Pseudocode:

Algorithm postfix (tree)


/*Print the postfix expression for an expression tree.
Pre : tree is a pointer to an expression tree
Post: the postfix expression has been printed*/
if (tree not empty)
postfix (tree left subtree)
postfix (tree right subtree)
print (tree token)
end if
end postfix

Prefix Traversal

The prefix expression formed by prefix traversal uses the standard pre-order tree traversal. No
parentheses are necessary.
Pseudocode:

Algorithm prefix (tree)


/*Print the prefix expression for an expression tree.
Pre : tree is a pointer to an expression tree
Post: the prefix expression has been printed*/
if (tree not empty)
print (tree token)
prefix (tree left subtree)
prefix (tree right subtree) and check if stack is not empty
end if
end prefix

Construction of Expression Tree

The evaluation of the tree takes place by reading the postfix expression one symbol at a time. If
the symbol is an operand, one-node tree is created and a pointer is pushed onto a stack. If the
symbol is an operator, the pointers are popped to two trees T1 and T2 from the stack and a new
tree whose root is the operator and whose left and right children point to T2 and T1 respectively
is formed . A pointer to this new tree is then pushed to the Stack.

Example

The input is: a b + c d e + * * Since the first two symbols are operands, one-node trees are
created and pointers are pushed to them onto a stack. For convenience the stack will grow from
left to right.

Stack growing from left to right

The next symbol is a '+'. It pops the two pointers to the trees, a new tree is formed, and a pointer
to it is pushed onto to the stack.
Formation of New Tree

Next, c, d, and e are read. A one-node tree is created for each and a pointer to the corresponding
tree is pushed onto the stack.

Continuing, a '+' is read, and it merges the last two trees.


Merging Two trees

Now, a '*' is read. The last two tree pointers are popped and a new tree is formed with a '*' as the
root.

Forming a new tree with a root

Finally, the last symbol is read. The two trees are merged and a pointer to the final tree remains
on the stack.
Steps to construct an expression tree a b + c d e + * *

Algebraic expressions

Binary algebraic expression tree equivalent to ((5 + z) / -8) * (4 ^ 2)

Algebraic expression trees represent expressions that contain numbers, variables, and unary and
binary operators. Some of the common operators are × (multiplication), ÷ (division), +
(addition), − (subtraction), ^ (exponentiation), and - (negation). The operators are contained in
the internal nodes of the tree, with the numbers and variables in the leaf nodes. The nodes of
binary operators have two child nodes, and the unary operators have one child node.

Boolean expressions

Binary Boolean Expression tree equivalent to ((true false) false) (true false))

Boolean expressions are represented very similarly to algebraic expressions, the only difference
being the specific values and operators used. Boolean expressions use true and false as constant
values, and the operators include (AND), (OR), (NOT).

3.5 BINARY SEARCH TREE

First of all, binary search tree (BST) is a dynamic data structure, which means,
that its size is only limited by amount of free memory in the operating system and
number of elements may vary during the program run. Main advantage of binary search
trees is rapid search, while addition is quite cheap. Let us see more formal definition of
BST.

Binary search tree is a data structure, which meets the following requirements:

 it is a binary tree;
 left subtree of a node contains only values lesser, than the node's value;
 right subtree of a node contains only values greater, than the node's value.

Notice, that definition above doesn't allow duplicates.

Example of a binary search tree


In this above figure the first one is Binary search tree. But the second one is not
a binary search tree.

What for binary search trees are used?

Binary search tree is used to construct map data structure. In practice, data can be
often associated with some unique key. For instance, in the phone book such a key is a
telephone number. Storing such a data in binary search tree allows to look up for the
record by key faster, than if it was stored in unordered list. Also, BST can be utilized to
construct set data structure, which allows to store an unordered collection of unique
values and make operations with such collections.

Performance of a binary search tree depends of its height. In order to keep tree
balanced and minimize its height, the idea of binary search trees was advanced in
balanced search trees (AVL trees, Red-Black trees, Splay trees). Here we will discuss the
basic ideas, laying in the foundation of binary search trees.

Binary tree

Binary tree is a widely-used tree data structure. Feature of a binary tree, which
distinguish it from common tree, is that each node has at most two children. Widespread
usage of binary tree is as a basic structure for binary search tree. Each binary tree has
following groups of nodes:
 Root: the topmost node in a tree. It is a kind of "main node" in the tree, because
all other nodes can be reached from root. Also, root has no parent. It is the node, at
which operations on tree begin (commonly).
 Internal nodes: these nodes has a parent (root node is not an internal node) and at
least one child.
 Leaf nodes: These nodes have a parent, but has no children.

Let us see an example of a binary tree.

Example of a binary tree

Operations
Basically, we can only define traversals for binary tree as possible operations:
root-left-right (preorder), left-right-root (postorder) and left-root-right (inorder)
traversals. We will speak about them in detail later.

3.6 Querying a binary search tree

Binary search tree – Insertion:

Adding a value to BST can be divided into two stages:

 Search for a place to put a new element;


 Insert the new element to this place.
Let us see these stages in more detail.
Search for a place

At this stage an algorithm should follow binary search tree property. If a new value is
less, than the current node's value, go to the left subtree, else go to the right subtree.
Following this simple rule, the algorithm reaches a node, which has no left or right
subtree. By the moment a place for insertion is found, we can say for sure, that a new
value has no duplicate in the tree. Initially, a new node has no children, so it is a leaf. Let
us see it at the picture. Gray circles indicate possible places for a new node.

3.7 Insertion and Deletion


Now, let's go down to algorithm itself. Here and in almost every operation on BST
recursion is utilized. Starting from the root,

1. Check, whether value in current node and a new value are equal. If so, duplicate is
found. Otherwise,
2. if a new value is less, than the node's value:
o if a current node has no left child, place for insertion has been found;
o Otherwise, handle the left child with the same algorithm.
3. if a new value is greater, than the node's value:
o if a current node has no right child, place for insertion has been found;
o Otherwise, handle the right child with the same algorithm.

Let’s have a look on the example, demonstrating a case of insertion in the binary search
tree.
Example

Insert 4 to the tree, shown above.


Binary Search Tree Search operation

Searching for a value in a BST is very similar to add operation. Search algorithm
traverses the tree "in-depth", choosing appropriate way to go, following binary search
tree property and compares value of each visited node with the one, we are looking for.
Algorithm stops in two cases:

 a node with necessary value is found;


 Algorithm has no way to go.

Search algorithm in detail

Now, let's see more detailed description of the search algorithm. Like an add operation,
and almost every operation on BST, search algorithm utilizes recursion. Starting from the
root,

1. Check, whether value in current node and searched value are equal. If so, value is
found. Otherwise,
2. if searched value is less, than the node's value:
o if current node has no left child, searched value doesn't exist in the BST;
o Otherwise, handle the left child with the same algorithm.
3. if a new value is greater, than the node's value:
o if current node has no right child, searched value doesn't exist in the BST;
o Otherwise, handle the right child with the same algorithm.

Let us have a look on the example, demonstrating searching for a value in the binary
search tree.

Example

Search for 3 in the tree, shown above.


As in add operation, check first if root exists. If not, tree is empty, and, therefore,
searched value doesn't exist in the tree.

Binary search tree: Removing a node

Remove operation on binary search tree is more complicated, than add and search.
Basically, in can be divided into two stages:

 search for a node to remove;


 if the node is found, run remove algorithm.

Remove algorithm in detail

Now, let's see more detailed description of a remove algorithm. First stage is
identical to algorithm for lookup, except we should track the parent of the current node.
Second part is trickier. There are three cases, which are described below.

1. Node to be removed has no children.

This case is quite simple. Algorithm sets corresponding link of the parent to NULL
and disposes the node.

Example. Remove -4 from a BST.

2. Node to be removed has one child.

It this case, node is cut from the tree and algorithm links single child (with it's
subtree) directly to the parent of the removed node.
Example. Remove 18 from a BST.

3. Node to be removed has two children.

This is the most complex case. To solve it, let us see one useful BST property first.
We are going to use the idea, that the same set of values may be represented as
different binary-search trees. For example those BSTs:
contains the same values {5, 19, 21, 25}. To transform first tree into second one, we can
do following:

o choose minimum element from the right subtree (19 in the example);
o replace 5 by 19;
o hang 5 as a left child.

The same approach can be utilized to remove a node, which has two children:

o find a minimum value in the right subtree;


o replace value of the node to be removed with found minimum. Now, right
subtree contains a duplicate!
o apply remove to the right subtree to remove a duplicate.
Notice, that the node with minimum value has no left child and, therefore, it's removal
may result in first or second cases only.

Example. Remove 12 from a BST.

Find minimum element in the right subtree of the node to be removed. In current
example it is 19.

Replace 12 with 19. Notice, that only values are replaced, not nodes. Now we have two
nodes with the same value.
Remove 19 from the left subtree.
First, check first if root exists. If not, tree is empty, and, therefore, value, that
should be removed, doesn't exist in the tree. Then, check if root value is the one to be
removed. It's a special case and there are several approaches to solve it. We propose
the dummy root method, when dummy root node is created and real root hanged to it as a
left child. When remove is done, set root link to the link to the left child of the dummy
root.

Binary search tree. List values in order

To construct an algorithm listing BST's values in order, let us recall binary search tree
property:

 left subtree of a node contains only values lesser, than the node's value;
 right subtree of a node contains only values greater, than the node's value.

Algorithm looks as following:


1. get values in order from left subtree;
2. get values in order from right subtree;
3. result for current node is (result for left subtree) join (current node's value) join
(result for right subtree).

Running this algorithm recursively, starting form the root, we'll get the result for whole
tree. Let us see an example of algorithm, described above.
Example
Algorithm Steps:
1. Create the memory space for the root node and initialize the value to zero.
2. Read the value.
3. If the value is less than the root value, it is assigned as the left child of the root.
Else if new value is greater than the root value, it is assigned as the right child of
the root. Else if there is no value in the root, the new value is assigned as the root.
4. The step(2) and (3) is repeated to insert the ‘n’ number of values.

Search Operation:

1. Read the value to be searched.


2. Check whether the root is not null
3. If the value to be searched is less than the root, consider the left sub-tree for
searching the particular element else if the value is greater than the root consider
the right sub-tree to search the particular element else if the value is equal then
return the value that is the value which was searched.

Program

#include<stdio.h>
#include<conio.h>
#include<process.h>
#include<alloc.h>

struct tree
{
int data;
struct tree *lchild;
struct tree *rchild;
}*t, *temp;

int element;
void inorder (struct tree *);
struct tree *create (struct tree *, int);
struct tree *find (struct tree *, int);
struct tree *insert (struct tree *, int);
struct tree *del (struct tree *, int);
struct tree *findmin (struct tree *);
struct tree *findmax (struct tree *);
void main( )
{
int ch;
printf (“\n\n\t BINARY SEARCH TREE”);
do
{
printf (“\nMain Menu\n”);
printf (“\n1.Create \n2.Insert \n3.Delete \n4.Find \n5.Findmax \n6.Findmin”);
printf (“\n7.Exit”);
printf (“\nEnter your choice:”);
scanf (“%d”, &ch);
switch(ch)
{
case 1:
printf (“Enter the element\n”);
scanf (“%d”, &element);
t = create (t, element);
inorder(t);
break;

case 2:
printf (“Enter the element\n”);
scanf (“%d”, &element);
t = insert (t, element);
inorder(t);
break;

case 3:
printf (“Enter the data”);
scanf (“%d”, &element);
t = del (t, element);
inorder(t);
break;

case 4:
printf (“Enter the data”);
scanf (“%d”, &element);
temp = find (t, element);
if(temp->data == element)
printf (“Element %d is found”, element);
else
printf (“Element is not found”);
break;

case 5:
temp = findmax(t);
printf(“Maximum element is %d”, temp->data);
break;

case 6:
temp = findmin(t);
printf (“Minimum element is %d”, temp->data);
break;
}
}while(ch != 7);
}

struct tree *create (struct tree* t, int element)


{
t = (struct tree*) malloc (sizeof(struct tree));
t->data = element;
t-> lchild = NULL;
t-> rchild = NULL;
return t;
}

struct tree *find (struct tree* t, int element)


{
if ( t== NULL)
return NULL;
if (element< t->data)
return (find(t->rchild, element) );
else
return t;
}

struct tree *findmin (struct tree* t)


{
if ( t == NULL)
return NULL;
else
if (t->lchild == NULL)
return t;
else
return (findmin (t->lchild));
}

struct tree *findmax (struct tree* t)


{
if (t != NULL)
{
while (t->rchild != NULL)
t = t->rchild;
}
return t;
}

struct tree *insert (struct tree* t, int element)


{
if (t== NULL)
{
t = (struct tree*) malloc (sizeof(struct tree));
t->data = element;
t->lchild = NULL;
t->rchild = NULL;
return t;
}
else
{
if(element< t->data)
t->lchild = insert(t->lchild, element);
else
if (element> t->data)
t->rchild = insert (t->rchild, element);
else
if(element == t->data)
printf ( “Element already present\n”);
return t;
}
}
struct tree* del (struct tree* t, int element)
{
if (t == NULL)
printf (“Element not found\n”);
else
if(element< t->data)
t->lchild = del (t->lchild, element);
else
if(element> t->data)
t->rchild = del (t->rchild, element);
else
if(t->lchild && t->rchild)
{
temp = findmin (t->rchild);
{
t->data = temp->data;
t->rchild = del (t->rchild, t->data);
}
}
else
{
temp = t;
if (t->lchild == NULL)
t = t->rchild;
else
if (t->rchild == NULL)
t = t->lchild;
free (temp);
}
return t;
}
void inorder (struct tree *t)
{
if (t = = NULL)
return;
else
{
inorder (t->lchild);
printf (“\t%d”, t->data);
inorder (t->rchild);
}
}

SAMPLE INPUT AND OUTPUT:

Main Menu

1.Create
2.Insert
3.Delete
4.Find
5.Findmax
6.Findmin
7.Exit
Enter your choice: 1
Enter the element
12
12
Main Menu

1.Create
2.Insert
3.Delete
4.Find
5.Findmax
6.Findmin
7.Exit
Enter your choice: 2
Enter the element
13
12 13
Main Menu

1.Create
2.Insert
3.Delete
4.Find
5.Findmax
6.Findmin
7.Exit
Enter your choice: 3
Enter the data12
13
Main Menu

1.Create
2.Insert
3.Delete
4.Find
5.Findmax
6.Findmin
7.Exit
Enter your choice: 4
Enter the data13
Element 13 is found

Main Menu

1.Create
2.Insert
3.Delete
4.Find
5.Findmax
6.Findmin
7.Exit
Enter your choice: 2
Enter the element
14
13 14
Main Menu

1.Create
2.Insert
3.Delete
4.Find
5.Findmax
6.Findmin
7.Exit
Enter your choice: 2
Enter the element
15
13 14 15

Main Menu
1.Create
2.Insert
3.Delete
4.Find
5.Findmax
6.Findmin
7.Exit
Enter your choice: 5
Maximum element is 15

Main Menu

1.Create
2.Insert
3.Delete
4.Find
5.Findmax
6.Findmin
7.Exit
Enter your choice: 6
Minimum element is 13

Main Menu

1.Create
2.Insert
3.Delete
4.Find
5.Findmax
6.Findmin
7.Exit
Enter your choice: 7
3.8 AVL Tree

 In a binary search tree, the time required to search for a particular value depends upon
its height.
 The shorter the height, the faster is the search.
 However, binary search trees tend to attain large heights because of continuous insert
and delete operations.
 Consider an example in which you want to insert some numeric values in a binary
search tree in the following order:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
 After inserting values in the specified order, the binary search tree appears as follows:

 This process can be very time consuming if the number of values stored in a binary
search tree is large.
 Now if you want to search for a value 14 in the given binary search tree, you will
have to traverse all its preceding nodes before you reach node 14. In this case, you
need to make 14 comparisons
 Therefore, such a structure loses its property of a binary search tree in which after
every comparison, the search operations are reduced to half.
 To solve this problem, it is desirable to keep the height of the tree to a minimum.
 Therefore, the following binary search tree can be modified to reduce its height.

 The height of the binary search tree has now been reduced to 4
 Now if you want to search for a value 14, you just need to traverse nodes 8 and 12,
before you reach node 14
 In this case, the total number of comparisons to be made to search for node 14 is
three.
 This approach reduces the time to search for a particular value in a binary search tree.
 This can be implemented with the help of a height balanced tree.

Height Balanced Tree

 A height balanced tree is a binary tree in which the difference between the heights of
the left subtree and right subtree of a node is not more than one.
 In a height balanced tree, each node has a Balance Factor (BF) associated with it.
 For the tree to be balanced, BF can have three values:
 0: A Balance Factor value of 0 indicates that the height of the left subtree of a
node is equal to the height of its right subtree.
 1: A Balance Factor value of 1 indicates that the height of the left subtree is
greater than the height of the right subtree by one. A node in this state is said to
be left heavy.
 – 1: A Balance Factor value of –1 indicates that the height of the right subtree
is greater than the height of the left subtree by one. A node in this state is said
to be right heavy.

Fig. Balanced Binary Search Trees

 After inserting a new node in a height balanced tree, the balance factor of one or more
nodes may attain a value other than 1, 0, or –1.
 This makes the tree unbalanced.
 In such a case, you first need to locate the pivot node.
 A pivot node is the nearest ancestor of the newly inserted node, which has a balance
factor other than 1, 0 or –1.
 To restore the balance, you need to perform appropriate rotations around the pivot
node.

Inserting Nodes in a Height Balanced Tree


 Insert operation in a height balanced tree is similar to that in a simple binary search
tree.
 However, inserting a node can make the tree unbalanced.
 To restore the balance, you need to perform appropriate rotations around the pivot
node.
 This involves two cases:
 When the pivot node is initially right heavy and the new node is inserted in the
right subtree of the pivot node.
 When the pivot node is initially left heavy and the new node is inserted in the left
subtree of the pivot node.
 Let us first consider a case in which the pivot node is initially right heavy and the new
node is inserted in the right subtree of the pivot node.
 In this case, after the insertion of a new element, the balance factor of pivot node
becomes –2.
 Now there can be two situations in this case:
 If a new node is inserted in the right subtree of the right child of the pivot
node.(LL)
 If the new node is inserted in the left subtree of the right child of the pivot
node.(LR)
 Consider the first case in which a new node is inserted in the right subtree of the right
child of the pivot node.
Single Rotation:

 The two trees in below Figure contain the same elements and are both binary search
trees.
 First of all, in both trees k1 < k2. Second, all elements in the subtree X are smaller
than k1 in both trees.
 Third, all elements in subtree Z are larger than k2. Finally, all elements in subtree Y
are in between k1 and k2. The conversion of one of the above trees to the other is
known as a rotation.
 A rotation involves only a few pointer changes (we shall see exactly how many later),
and changes the structure of the tree while preserving the search tree property.
 The rotation does not have to be done at the root of a tree; it can be done at any node
in the tree, since that node is the root of some subtree.
 It can transform either tree into the other.
 This gives a simple method to fix up an AVL tree if an insertion causes some node in
an AVL tree to lose the balance property: Do a rotation at that node.

K K
1 2

K K
2 1

A C

A
B C B
The basic algorithm is to start at the node inserted and travel up the tree, updating
the balance information at every node on the path. If we get to the root without having
found any badly balanced nodes, we are done. Otherwise, we do a rotation at the first bad
node found, adjust its balance, and are done (we do not have to continue going to the
root). In many cases, this is sufficient to rebalance the tree. For instance, in the below
Figure, after the insertion of the 61/2 in the original AVL tree on the left, node 8
becomes unbalanced. Thus, we do a single rotation between 7 and 8, obtaining the tree on
the right.

K
K
2
1

K
K
1
2
C
A

A
B B C
Let us work through a rather long example. Suppose we start with an initially
empty AVL tree and insert the keys 1 through 7 in sequential order. The first problem
occurs when it is time to insert key 3, because the AVL property is violated at the root.
We perform a single rotation between the root and its right child to fix the problem. The
tree is shown in the following figure, before and after the rotation:

Figure 4.32 AVL property destroyed by insertion of 6 1/2 , then fixed by a rotation

To make things clearer, a dashed line indicates the two nodes that are the subject
of the rotation. Next, we insert the key 4, which causes no problems, but the insertion of
5 creates a violation at node 3, which is fixed by a single rotation. Besides the local
change caused by the rotation, the programmer must remember that the rest of the tree
must be informed of this change. Here, this means that 2's right child must be reset to
point to 4 instead of 3. This is easy to forget to do and would destroy the tree (4 would be
inaccessible).
Next, we insert 6. This causes a balance problem for the root, since its left subtree
is of height 0, and its right subtree would be height 2. Therefore, we perform a single
rotation at the root between 2 and 4.

2
4
1 4
2 5
3 5
1 3 6

6
The rotation is performed by making 2 a child of 4 and making 4's original left
subtree the new right subtree of 2. Every key in this subtree must lie between 2 and 4, so
this transformation makes sense. The next key we insert is 7, which causes another
rotation.

4 4

2 5
2 6

1 3 6 1 3 5 7

7
Double Rotation:

The algorithm described in the preceding paragraphs has one problem. There is a
case where the rotation does not fix the tree. Continuing our example, suppose we insert
keys 8 through 15 in reverse order. Inserting 15 is easy, since it does not destroy the
balance property, but inserting 14 causes a height imbalance at node 7.

As the diagram shows, the single rotation has not fixed the height imbalance. The
problem is that the height imbalance was caused by a node inserted into the tree
containing the middle elements (tree Y in Fig. 4.31) at the same time as the other trees
had identical heights. The case is easy to check for, and the solution is called a double
rotation, which is similar to a single rotation but involves four subtrees instead of three.
In Figure 4.33, the tree on the left is converted to the tree on the right. By the way, the
effect is the same as rotating between k1 and k2 and then between k2 and k3.
In our example, the double rotation is a right-left double rotation and involves 7,
15, and 14. Here, k3 is the node with key 7, k1 is the node with key 15, and k2 is the node
with key 14. Subtrees A, B, C, and D are all empty.
Next we insert 13, which require a double rotation. Here the double rotation is
again a right-left double rotation that will involve 6, 14, and 7 and will restore the tree. In
this case, k3 is the node with key 6, k1 is the node with key 14, and k2 is the node with
key 7. Subtree A is the tree rooted at the node with key 5, subtree B is the empty subtree
that was originally the left child of the node with key 7, subtree C is the tree rooted at the
node with key 13, and finally, subtree D is the tree rooted at the node with key 15.
If 12 is now inserted, there is an imbalance at the root. Since 12 is not between 4
and 7, we know that the single rotation will work.

Insertion of 11 will require a single rotation:

To insert 10, a single rotation needs to be performed, and the same is true for the
subsequent insertion of 9. We insert 8 without a rotation, creating the almost perfectly
balanced tree that follows.
Finally, we insert 81/2 to show the symmetric case of the double rotation. Notice
that 81/2 causes the node containing 9 to become unbalanced. Since 81/2 is between 9
and 8 (which is 9's child on the path to 81/2, a double rotation needs to be performed,
yielding the following tree.

Example:

 Let us consider another example to insert values in a binary search tree and restore its
balance whenever required.
50 40 30 60 55 80 10 35 32

Insert 50

Tree is balanced.

Insert 40

Tree is balanced.

Insert 30

Before rotation After rotation

Tree becomes unbalanced A single right rotation (LL) restores the balance

Insert 60: Insert 55:


Before Rotation After rotation
Now the Tree is unbalanced, A double rotation restores the balance(RL)

Before Rotation After Rotation

Insert 80 (single left rotation)

Insert 10 Insert 35
Insert 32 (double rotation)

Finally the Tree becomes balanced.

3.9 B-Trees

Definition of B-Trees
A B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and
deletions in logarithmic amortized time. Unlike self-balancing binary search trees, it is
optimized for systems that read and write large blocks of data. It is most commonly used in
database and file systems.

The B-Tree Rules

Important properties of a B-tree:


 B-tree nodes have many more than two children.
 A B-tree node may contain more than just a single element.

The set formulation of the B-tree rules: Every B-tree depends on a positive constant integer
called MINIMUM, which is used to determine how many elements are held in a single node.

 Rule 1: The root can have as few as one element (or even no elements if it also has no
children); every other node has at least MINIMUM elements.
 Rule 2: The maximum number of elements in a node is twice the value of MINIMUM.
 Rule 3: The elements of each B-tree node are stored in a partially filled array, sorted
from the smallest element (at index 0) to the largest element (at the final used position of
the array).
 Rule 4: The number of subtrees below a nonleaf node is always one more than the
number of elements in the node.
o Subtree 0, subtree 1, ...
 Rule 5: For any nonleaf node:

1. An element at index i is greater than all the elements in subtree number i of the
node, and
2. An element at index i is less than all the elements in subtree number i + 1 of the
node.
 Rule 6: Every leaf in a B-tree has the same depth. Thus it ensures that a B-tree avoids
the problem of a unbalanced tree.

Searching for a Target in a Set


The psuedocode:
1. Make a local variable, i, equal to the first index such that data[i] >= target. If there is no
such index, then set i equal to dataCount, indicating that none of the elements is greater than or
equal to the target.
2. if (it found the target at data[i])
return true;
else if (the root has no children)
return false;
else return subset[i].contains(target);
See the following example, try to search for 10.

It can implement a private method:


• private int firstGE(int target), which returns the first location in the root such that
data[x] >= target. If there's no such location, then return value is dataCount.
3.10 Basic operations on B-trees- insertion and deletion
Adding an Element to a B-Tree
It is easier to add a new element to a B-tree if we relax one of the B-tree rules.
Loose addition allows the root node of the B-tree to have MAXIMUM + 1 elements. For
example, suppose we want to add 18 to the tree:

The above result is an illegal B-tree. Our plan is to perform a loose addition first, and then fix
the root's problem.
The Loose Addition Operation for a B-Tree:
private void looseAdd(int element)
{
1. i = firstGE(element) // find the first index such that data[i] >= element
2. if (we found the new element at data[i]) return; // since there's already a copy in the set
3. else if (the root has no children)
Add the new element to the root at data[i]. (shift array)
4. else {
subset[i].looseAdd(element);
if the root of subset[i] now has an excess element, then fix that problem before returning.
}
}
private void fixExcess(int i)
// precondition: (i < childCount) and the entire B-tree is valid except that subset[i] has
MAXIMUM + 1 elements.
// postcondition: the tree is rearranged to satisfy the loose addition rule
Fixing a Child with an Excess Element:
 To fix a child with MAXIMIM + 1 elements, the child node is split into two nodes that
each contain MINIMUM elements. This leaves one extra element, which is passed up to
the parent.
 It is always the middle element of the split node that moves upward.
 The parent of the split node gains one additional child and one additional element.
 The children of the split node have been equally distributed between the two smaller
nodes.

Fixing the Root with an Excess Element:


 Create a new root.
 fixExcess(0).
Removing an Element from a B-Tree
Loose removal rule:
Loose removal allows to leave a root that has one element too few.
public boolean remove(int target)
{
answer = looseRemove(target);
if ((dataCount == 0) && (childCount == 1))
Fix the root of the entire tree so that it no longer has zero elements;
return answer;
}

private boolean looseRemove(int target)


{
1. i = firstGE(target)
2. Deal with one of these four possibilities:
2a. if (root has no children and target not found) return false.
2b. if( root has no children but target found) {
remove the target
return true
}
2c. if (root has children and target not found) {
answer = subset[i].looseRemove(target)
if (subset[i].dataCount < MINIMUM)
fixShortage(i)
return true
}
2d. if (root has children and target found) {
data[i] = subset[i].removeBiggest()
if (subset[i].dataCount < MINIMUM)
fixShortage(i)
return true
}
}

private void fixShortage(int i)


// Precondition: (i < childCount) and the entire B-tree is valid except that subset[i] has
MINIMUM - 1 elements.
// Postcondition: problem fixed based on the looseRemoval rule.

private int removeBiggest()


// Precondition: (dataCount > 0) and this entire B-tree is valid
// Postcondition: the largest element in this set has been removed and returned. The entire B-tree
is still valid based on the looseRemoval rule.

Fixing Shortage in a Child:


When fixShortage(i) is activated, we know that subset[i] has MINIMUM - 1 elements. There are
four cases that we need to consider:
Case 1: Transfer an extra element from subset[i-1]. Suppose subset[i-1] has more than the
MINIMUM number of elements.
a. Transfer data[i-1] down to the front of subset[i].data.
b. Transfer the final element of subset[i-1].data up to replace data[i-1].
c. If subset[i-1] has children, transfer the final child of subset[i-1] over to the front of
subset[i].
Case 2: Transfer an extra element from subset[i+1]. Suppose subset[i+1] has more than the
MINIMUM number of elements.
Case 3: Combine subset[i] with subset[i-1]. Suppose subset[i-1] has only MINIMUM
elements.Transfer data[i-1] down to the end of subset[i-1].data.
a. Transfer all the elements and children from subset[i] to the end of subset[i-1].
b. Disconnect the node subset[i] from the B-tree by shifting subset[i+1], subset[i+2] and so
on leftward.

Case 4: Combine subset[i] with subset[i+1]. Suppose subset[i+1] has only MINIMUM
elements.
It may need to continue activating fixShortage() until the B-tree rules are satisfied.
Removing the Biggest Element from a B-Tree:
private int removeBiggest()
{
if (root has no children)
remove and return the last element
else {
answer = subset[childCount-1].removeBiggest()
if (subset[childCount-1].dataCount < MINIMUM)
fixShortage(childCount-1)
return answer
}
}
A more concrete example for node deletion:
Exceptions of 2-3 Tree
A 2-3 tree is a type of B-tree where every node with children (internal node) has either two
children and one data element (2-nodes) or three children and two data elements (3-node). Leaf
nodes have no children and one or two data elements.

Trees-Time Analysis
The implementation of a B-tree is efficient since the depth of the tree is kept small.
Worst-case times for tree operations: the worst-case time performance for the following
operations are all O(d), where d is the depth of the tree:
1. Adding an element to a binary search tree (BST), a heap, or a B-tree.
2. Removing an element from a BST, a heap, or a B-tree.

3. Searching for a specified element in a BST or a B-tree.

Exceptions of 2-3 Tree


A 2-3 tree is a type of B-tree where every node with children (internal node) has either two
children and one data element (2-nodes) or three children and two data elements (3-node). Leaf
nodes have no children and one or two data elements.

Trees-Time Analysis
The implementation of a B-tree is efficient since the depth of the tree is kept small.
Worst-case times for tree operations: the worst-case time performance for the following
operations are all O(d), where d is the depth of the tree:
1. Adding an element to a binary search tree (BST), a heap, or a B-tree.
2. Removing an element from a BST, a heap, or a B-tree.
3. Searching for a specified element in a BST or a B-tree.

Time Analysis for BST


Suppose a BST has n elements. What is the maximum depth the tree could have?
 A BST with n elements could have a depth as big as n-1.

Worst-Case Times for BSTs:


 Adding an element, removing an element, or searching for an element in a BST
with n elements is O(n).

Time Analysis for Heaps


Remember that a heap is a complete BST, so each level must be full before proceeding to the
next level.
Number of nodes needed for a heap to reach depth d is: (1 + 2 + 4 + 8 + ... + 2d-1) + 1 = 2d = n.
Thus d = log2n.
Worst-Case Times for Heap Operations:
 Adding or removing an element in a heap with n elements is O(log n).

Time Analysis for B-Tree


Suppose a B-tree has n elements and M is the maximum number of children a node can have.
What is the maximum depth the tree could have? What is the minimum depth the tree could
have?
 The worst-case depth (maximum depth) of a B-tree is: logM/2 n.
 The best-case depth (minimum depth) of a B-tree is: logM n.

Worst-Case Times for B-Trees:


 Adding or removing an element in a B-tree with n elements is O(log n).
The idea it is earlier to seen that of putting multiple set (list, hash table) elements together
into large chunks that exploit locality can also be applied to trees. Binary search trees are not
good for locality because a given node of the binary tree probably occupies only a fraction of
any cache line. B-trees are a way to get better locality by putting multiple elements into each
tree node.

B-trees were originally invented for storing data structures on disk, where locality is even
more crucial than with memory. Accessing a disk location takes about 5ms = 5,000,000ns.
Therefore, if you are storing a tree on disk, you want to make sure that a given disk read is as
effective as possible. B-trees have a high branching factor, much larger than 2, which ensures
that few disk reads are needed to navigate to the place where data is stored. B-trees may also
useful for in-memory data structures because these days main memory is almost as slow
relative to the processor as disk drives were to main memory when B-trees were first
introduced!

A B-tree of order m is a search tree in which each nonleaf node has up to m children. The
actual elements of the collection are stored in the leaves of the tree, and the nonleaf nodes
contain only keys. Each leaf stores some number of elements; the maximum number may be
greater or (typically) less than m. The data structure satisfies several invariants:

1. Every path from the root to a leaf has the same length

2. If a node has n children, it contains n−1 keys.


3. Every node (except the root) is at least half full

4. The elements stored in a given subtree all have keys that are between the keys in
the parent node on either side of the subtree pointer. (This generalizes the BST
invariant.)

5. The root has at least two children if it is not a leaf.

For example, the following is an order-5 B-tree (m=5) where the leaves have enough space
to store up to 3 data records:

Because the height of the tree is uniformly the same and every node is at least half full, we
are guaranteed that the asymptotic performance is O(lg n) where n is the size of the collection.
The real win is in the constant factors, of course. We can choose m so that the pointers to
the m children plus the m−1 elements fill out a cache line at the highest level of the memory
hierarchy where we can expect to get cache hits. For example, if we are accessing a large disk
database then our "cache lines" are memory blocks of the size that is read from disk.

Lookup in a B-tree is straightforward. Given a node to start from, we use a simple linear or
binary search to find whether the desired element is in the node, or if not, which child pointer
to follow from the current node.

Insertion and deletion from a B-tree are more complicated; in fact, they are notoriously
difficult to implement correctly. For insertion, we first find the appropriate leaf node into which
the inserted element falls (assuming it is not already in the tree). If there is already room in the
node, the new element can be inserted simply. Otherwise the current leaf is already full and
must be split into two leaves, one of which acquires the new element. The parent is then
updated to contain a new key and child pointer. If the parent is already full, the process ripples
upwards, eventually possibly reaching the root. If the root is split into two, then a new root is
created with just two children, increasing the height of the tree by one.

For example, here is the effect of a series of insertions. The first insertion (13) merely affects
a leaf. The second insertion (14) overflows the leaf and adds a key to an internal node. The
third insertion propagates all the way to the root.

Deletion works in the opposite way: the element is removed from the leaf. If the leaf
becomes empty, a key is removed from the parent node. If that breaks invariant 3, the keys of
the parent node and its immediate right (or left) sibling are reapportioned among them so that
invariant 3 is satisfied. If this is not possible, the parent node can be combined with that
sibling, removing a key another level up in the tree and possible causing a ripple all the way to
the root. If the root has just two children, and they are combined, then the root is deleted and
the new combined node becomes the root of the tree, reducing the height of the tree by one.

4.1 Priority Queues (Heaps) – Model

Consider that print jobs are given in printer. Although jobs sent to a line printer are generally
placed on a queue, this might not always be the best thing to do. For instance, one job might be
particularly important, so that it might be desirable to allow that job to be run as soon as the printer is
available. Conversely, if, when the printer becomes available, there are several one-page jobs and one
hundred-page job, it might be reasonable to make the long job go last, even if it is not the last job
submitted.

Similarly, in a multiuser environment, the operating system scheduler must decide which of
several processes to run. Generally a process is only allowed to run for a fixed period of time. One
algorithm uses a queue. Jobs are initially placed at the end of the queue. The scheduler will repeatedly
take the first job on the queue, run it until either it finishes or its time limit is up, and place it at the
end of the queue if it does not finish. This strategy is generally not appropriate, because very short
jobs will seem to take a long time because of the wait involved to run. Generally, it is important that
short jobs finish as fast as possible, so these jobs should have preference over jobs that have already
been running. Furthermore, some jobs that are not short are still very important and should also have
preference.

Model:

 A Priority Queue is data structure that allows the following two operations: Insert which does
the obvious thing of insertion, and Delete Min. Which finds, returns, removes the minimum
element in the priority Queue
 The Insert operation is the equivalent of Enqueue and Delete Min is the Priority Queue
equivalent of the Dequeue option.
 The Delete Min function also alter its input
 Priority queues have many applications besides the operating systems
 Priority queues are also important in the implementation of greedy algorithms, which operate by
repeatedly finding a minimum.
4.2 Simple Implementation:

 There are several obvious way to implement a Priority Queue.


 We could use a simple linked list, performing insertions at the front in O(1) and traversing the
list, which requires O(n) time, to delete the minimum.
 we could insist that the list be always kept sorted; this makes insertions expensive (O(n)) and
delete_mins cheap (O(1))
 Another way of implementing priority queue would be to use a binary search tree (BST). This
given an O(log N) average running time for both operations.
 This is true in spite of the fact that although the insertions are random, the deletions are not.
Repeatedly removing a node that is in the left subtree would seem to hurt the balance of the tree
by making the right subtree heavy.

4.3 Binary Heap

 The implementation we will use is known as a binary heap.


 Its use is so common for priority queue implementations that, when the word heap is used
without a qualifier, it is generally assumed to be referring to this implementation of the data
structure.
 As with AVL Trees, on operation on a heap can destroy one of the properties, so a heap operation
must not terminate until all heap properties are in order.

Properties of Binary heap:

 Structure Property
 Heap order Property
Structure Property:

 A heap is a binary tree that is completely filled with the possible exception of the bottom level,
which is filled from left to right. Such tree is known as complete binary tree.
 It is easy to show that a complete binary tree of height ‘h’ has between 2h and 2h+1 -1 nodes.
 An important observation is that a complete binary tree is so regular, it can be represented in an
array and no pointer is necessary.
 For any element in array position i the left child is in position 2i, the right child is the cell after
the left child (2i+1) and the parent is in position (i/2)

 Thus not only are pointers not required, but the operations required to traverse the tree are
extremely simple and likely to be very fast on most computers.
 The only problem with this implementation is that an estimate of the maximum heap size is
required in advance, but typically this is not a problem.

 A heap data structure will, then, consist of an array (of whatever type the key is) and integers
representing the maximum 2nd current heap size. Figure 6.4 shows a typical priority queue
declaration.
Heap order Property

.
 Analogously, we can declare a (max) heap, which enables us to efficiently find and remove the
maximum element, by changing the heap order property. Thus, a priority queue can be used to
find either a minimum or a maximum, but this needs to be decided ahead of time.
 By the heap order property, the minimum element can always be found at the root. Thus, we get
the extra operation, find_min, in constant time.
 The property that allows operations to be performed quickly is the heap order property.
 Since we want to be able to find the minimum quickly, it makes sense that the smallest element
should be at the root.
 If we consider that any subtree should also be a heap, then any node should be smaller than all of
its descendants.

 Applying this logic, we arrive at the heap order property. In a heap, for every node X, the key in
the parent of X is smaller than (or equal to) the key in X, with the exception of the root (Refer –
Heap sort).
 In Figure the tree on the left is a heap, but the tree on the right is not (the dashed line shows the
violation of heap order).
 Analogously, we can declare a (max) heap, which enables us to efficiently find and remove the
maximum element, by changing the heap order property. Thus, a priority queue can be used to
find either a minimum or a maximum, but these needs to be decided ahead of time.
 By the heap order property, the minimum element can always be found at the root. Thus, we get
the extra operation, find_min, in constant time
Basic Heap Operations

 It is easy to perform the two required operations. All the work involves ensuring that the heap
order property is maintained.
 Insert
 Delete Min

INSERT

 To insert an element x into the heap, we create a hole in the next available location, since
otherwise the tree will not be complete.
 If x can be placed in the hole without violating heap order, then we do so and are done.
Otherwise we slide the element that is in the hole's parent node into the hole, thus bubbling the
hole up toward the root.
 We continue this process until x can be placed in the hole.
 Figure shows that to insert 14, we create a hole in the next available heap location.
 Inserting 14 in the hole would violate the heap order property, so 31 is slid down into the hole.
This strategy is continued in Figure 6.7 until the correct location for 14 is found.
 This general strategy is known as a percolate up; the new element is percolated up the heap until
the correct location is found.
 We could have implemented the percolation in the insert routine by performing repeated swaps
until the correct order was established, but a swap requires three assignment statements.
 If an element is percolated up d levels, the number of assignments performed by the swaps
would be 3d. Our method uses d + 1 assignments.

Delete_min

 Delete_mins are handled in a similar manner as insertions. Finding the minimum is easy; the
hard part is removing it.
 When the minimum is removed, a hole is created at the root.
 Since the heap now becomes one smaller, it follows that the last element x in the heap must
move somewhere in the heap.
 If x can be placed in the hole, then we are done. This is unlikely, so we slide the smaller of the
hole's children into the hole, thus pushing the hole down one level.
 We repeat this step until x can be placed in the hole. Thus, our action is to place x in its correct
spot along a path from the root containing minimum children.
 In Figure 6.9 the left figure shows a heap prior to the delete_min. After 13 is removed, we must
now try to place 31 in the heap. 31 cannot be placed in the hole, because this would violate heap
order.
 Thus, we place the smaller child (14) in the hole, sliding the hole down one level (see Fig. 6.10).
We repeat this again, placing 19 into the hole and creating a new hole one level deeper.
 We then place 26 in the hole and create a new hole on the bottom level. Finally, we are able to
place 31 in the hole (Fig. 6.11). This general strategy is known as a percolate down.
 We use the same technique as in the insert routine to avoid the use of swaps in this routine.

*******
EXERCISES
PART A
1. Define a tree
2. Define root
3. Define degree of the node
4. Define leaves
5. Define depth and height of a node
6. Define depth and height of a tree
7. Define a binary tree
8. Define a path in a tree
9. Define terminal nodes in a tree
10. Define non-terminal nodes in a tree
11. Define a full binary tree
12. Define a complete binary tree
13. Define a right-skewed binary tree
14. State the properties of a binary tree
15. What is meant by binary tree traversal?
16. What are the different binary tree traversal techniques?
17. What are the tasks performed while traversing a binary tree?
18. What are the tasks performed during preorder traversal?
19. What are the tasks performed during inorder traversal?
20. What are the tasks performed during postorder traversal?
21. State the merits of linear representation of binary trees.
22. State the demerit of linear representation of binary trees.
23. State the merit of linked representation of binary trees.
24. State the demerits of linked representation of binary trees.
25. Define a binary search tree
26. What do you mean by general trees?
27. Define ancestor and descendant
28. Why it is said that searching a node in a binary search tree is efficient than that of a
simple binary tree?
29. Define AVL Tree.
30. What do you mean by balanced trees?
31. What are the categories of AVL rotations?
32. What do you mean by balance factor of a node in AVL tree?
33. What is the minimum number of nodes in an AVL tree of height h?
34. Define B-tree of order M.
35. What do you mean by 2-3 tree?
36. What do you mean by 2-3-4 tree?
37. What are the applications of B-tree?
38 What is an expression tree?

PART – B
1. What is a binary search tree? Explain with example?
2. Explain binary tree traversals?
3. Explain the expression trees?
4. Write the procedure to convert general tree to binary tree.
5. Explain the various operations performed on a stack.
6. Write an algorithm to convert infix expression to postfix expression.
7. Explain how the "switch" statement is used in the programs instead of multiple "if else"
statements with suitable example program.
8. Explain AVL tree in detail
9. What is a binary tree? Explain binary tree traversal in “c”
10. Construct a binary tree to satisfy the following orders:
11. Explain representing lists as binary trees. Write an algorithm to find kth element and deleting
it.
12. Explain how the following "infix" expression is evaluated with the help of Stack :
UNIT – IV

HASHING AND SORTING


Hashing-Open addressing-Rehashing-Extendible hashing. Sorting-Bubble sort, insertion sort,
selection sort, shell sort, Heap sort, quick sort, Radix sort, Merge sort. Searching- Linear search,
Binary search.

4.1 Hash Tables and Direct Address Tables


 Define and describe what a hash table is
- Introduce key/value relationships
- Introduce concepts such as table size and other aspects of tables that are independent
of type and method of implementation.
- Define "n" for time-complexity.

 Iteration order for hash tables by augmenting the structure


- iterating over items in the order in which they were inserted
- iterating over the items based on most-recently-used
A hash table, or a hash map, is a data structure that associates keys with values. The
primary operation it supports efficiently is a lookup: given a key (e.g. a person's name), find the
corresponding value (e.g. that person's telephone number). It works by transforming the key
using a hash function into a hash, a number that the hash table uses to locate the desired value.

Time complexity and common uses of hash tables


Hash tables are often used to implement associative arrays, sets and caches. Like arrays, hash
tables provide constant-time O(1) lookup on average, regardless of the number of items in the
table. However, the rare worst-case lookup time can be as bad as O(n). Compared to other
associative array data structures, hash tables are most useful when large numbers of records of
data are to be stored.

Hash tables may be used as in-memory data structures. Hash tables may also be adopted for use
with persistent data structures; database indexes commonly use disk-based data structures based
on hash tables.
Hash tables are used to speed-up string searching in many implementations of data compression.

2.8 Hash Function-Open Addressing


A good hash function is essential for good hash table performance. A poor choice of hash
function is likely to lead to clustering, in which probability of keys mapping to the same hash
bucket (i.e. a collision) is significantly greater than would be expected from a random function.
Nonzero probability of collisions is inevitable in any hash implementation, but usually the
number of operations to resolve collisions scales linearly with the number of keys mapping to the
same bucket, so excess collisions will degrade performance significantly. In addition, some hash
functions are computationally expensive, so the amount of time (and, in some cases, memory)
taken to compute the hash may be burdensome.

Choosing a good hash function is tricky. The literature is replete with poor choices, at least when
measured by modern standards However, since poor hashing merely degrades hash table
performance for particular input key distributions, such problems commonly go undetected.
The literature is similarly sparse on the criteria for choosing a hash function. Unlike most other
fundamental algorithms and data structures, there is no universal consensus on what makes a
"good" hash function. The remainder of this section is organized by three criteria: simplicity,
speed, and strength, and will survey algorithms known to perform well by these criteria.

Simplicity and speed are readily measured objectively (by number of lines of code and CPU
benchmarks, for example), but strength is a more slippery concept. Obviously, a cryptographic
hash function such as SHA-1 would satisfy the relatively lax strength requirements needed for
hash tables, but their slowness and complexity makes them unappealing. In fact, even a
cryptographic hash does not provide protection against an adversary who wishes to degrade hash
table performance by choosing keys all hashing to the same bucket. For these specialized cases, a
universal hash function should be used instead of any one static hash, no matter how
sophisticated.

In the absence of a standard measure for hash function strength, the current state of the art is to
employ a battery of statistical tests to measure whether the hash function can be readily
distinguished from a random function. Arguably the most important such test is to determine
whether the hash function displays the avalanche effect, which essentially states that any single-
bit change in the input key should affect on average half the bits in the output. Bret Mulvey
advocates testing the strict avalanche condition in particular, which states that, for any single-bit
change, each of the output bits should change with probability one-half, independent of the other
bits in the key. Purely additive hash functions such as CRC fail this stronger condition miserably.

Clearly, a strong hash function should have a uniform distribution of hash values. Bret Mulvey
proposes the use of a chi-squared test for uniformity, based on power of two hash table sizes
ranging from 21 to 216. This test is considerably more sensitive than many others proposed for
measuring hash functions, and finds problems in many popular hash functions.

Fortunately, there are good hash functions that satisfy all these criteria. The simplest class all
consume one byte of the input key per iteration of the inner loop. Within this class, simplicity
and speed are closely related, as fast algorithms simply don't have time to perform complex
calculations. Of these, one that performs particularly well is the Jenkins One-at-a-time hash,
adapted here from an article by Bob Jenkins, its creator.

uint32 joaat_hash(uchar *key, size_t len)


{
uint32 hash = 0;
size_t i;

for (i = 0; i < len; i++)


{
hash += key[i];
hash += (hash << 10);
hash ^= (hash >> 6);
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
return hash;
}
Security Purpose
Avalanche behavior of Jenkins One-at-a-time hash over 3-byte keys
The avalanche behavior of this hash shown on the right. The image was made using Bret
Mulvey's AvalancheTest in his Hash.cs toolset. Each row corresponds to a single bit in the input,
and each column to a bit in the output. A green square indicates good mixing behavior, a yellow
square weak mixing behavior, and red would indicate no mixing. Only a few bits in the last byte
are weakly mixed, a performance vastly better than a number of widely used hash functions.

Many commonly used hash functions perform poorly when subjected to such rigorous avalanche
testing. The widely favored FNV hash, for example, shows many bits with no mixing at all,
especially for short keys. If speed is more important than simplicity, then the class of hash
functions which consume multibyte chunks per iteration may be of interest. One of the most
sophisticated is "lookup3" by Bob Jenkins, which consumes input in 12 byte (96 bit) chunks.
Note, though, that any speed improvement from the use of this hash is only likely to be useful for
large keys, and that the increased complexity may also have speed consequences such as
preventing an optimizing compiler from inlining the hash function. Bret Mulvey analyzed an
earlier version, lookup2, and found it to have excellent avalanche behavior.

One desirable property of a hash function is that conversion from the hash value (typically 32
bits) to an bucket index for a particular-size hash table can be done simply by masking,
preserving only the lower k bits for a table of size 2k (an operation equivalent to computing the
hash value modulo the table size). This property enables the technique of incremental doubling
of the size of the hash table - each bucket in the old table maps to only two in the new table.
Because of its use of XOR-folding, the FNV hash does not have this property. Some older hashes
are even worse, requiring table sizes to be a prime number rather than a power of two, again
computing the bucket index as the hash value modulo the table size. In general, such a
requirement is a sign of a fundamentally weak function; using a prime table size is a poor
substitute for using a stronger function.

Collision resolution
If two keys hash to the same index, the corresponding records cannot be stored in the same
location. So, if it's already occupied, we must find another location to store the new record, and
do it so that it can find out and when look it up later on.
To give an idea of the importance of a good collision resolution strategy, consider the following
result, derived using the birthday paradox. Even if it assumes that our hash function outputs
random indices uniformly distributed over the array, and even for an array with 1 million entries,
there is a 95% chance of at least one collision occurring before it contains 2500 records.

There are a number of collision resolution techniques, but the most popular are chaining and
open addressing.

Chaining
Hash collision resolved by chaining.
In the simplest chained hash table technique, each slot in the array references a linked list of
inserted records that collide to the same slot. Insertion requires finding the correct slot, and
appending to either end of the list in that slot; deletion requires searching the list and removal.

Chained hash tables have advantages over open addressed hash tables in that the removal
operation is simple and resizing the table can be postponed for a much longer time because
performance degrades more gracefully even when every slot is used. Indeed, many chaining hash
tables may not require resizing at all since performance degradation is linear as the table fills.
For example, a chaining hash table containing twice its recommended capacity of data would
only be about twice as slow on average as the same table at its recommended capacity.

Chained hash tables inherit the disadvantages of linked lists. When storing small records, the
overhead of the linked list can be significant. An additional disadvantage is that traversing a
linked list has poor cache performance.

Alternative data structures can be used for chains instead of linked lists. By using a self-
balancing tree, for example, the theoretical worst-case time of a hash table can be brought down
to O(log n) rather than O(n). However, since each list is intended to be short, this approach is
usually inefficient unless the hash table is designed to run at full capacity or there are unusually
high collision rates, as might occur in input designed to cause collisions. Dynamic arrays can
also be used to decrease space overhead and improve cache performance when records are small.
Some chaining implementations use an optimization where the first record of each chain is stored
in the table. Although this can increase performance, it is generally not recommended: chaining
tables with reasonable load factors contain a large proportion of empty slots, and the larger slot
size causes them to waste large amounts of space.

Open addressing

Hash collision resolved by linear probing (interval=1).


Open addressing hash tables can store the records directly within the array. A hash collision is
resolved by probing, or searching through alternate locations in the array (the probe sequence)
until either the target record is found, or an unused array slot is found, which indicates that there
is no such key in the table. Well known probe sequences include:

1.Linear probing in which the interval between probes is fixed—often at 1,


2.Quadratic probing in which the interval between probes increases linearly (hence, the indices
are described by a quadratic function), and
3.Double hashing in which the interval between probes is fixed for each record but is computed
by another hash function.
The main tradeoffs between these methods are that linear probing has the best cache performance
but is most sensitive to clustering, while double hashing has poor cache performance but exhibits
virtually no clustering; quadratic hashing falls in-between in both areas. Double hashing can also
require more computation than other forms of probing. Some open addressing methods, such as
last-come-first-served hashing and cuckoo hashing move existing keys around in the array to
make room for the new key. This gives better maximum search times than the methods based on
probing.

A critical influence on performance of an open addressing hash table is the load factor; that is,
the proportion of the slots in the array that are used. As the load factor increases towards 100%,
the number of probes that may be required to find or insert a given key rises dramatically. Once
the table becomes full, probing algorithms may even fail to terminate. Even with good hash
functions, load factors are normally limited to 80%. A poor hash function can exhibit poor
performance even at very low load factors by generating significant clustering. What causes hash
functions to cluster is not well understood, and it is easy to unintentionally write a hash function
which causes severe clustering.

Algorithm format
The following pseudocode is an implementation of an open addressing hash table with linear
probing and single-slot stepping, a common approach that is effective if the hash function is
good. Each of the lookup, set and remove functions use a common internal function findSlot to
locate the array slot that either does or should contain a given key.

record pair { key, value }


var pair array slot[0..numSlots-1]

function findSlot(key)
i := hash(key) modulus numSlots
loop
if slot[i] is not occupied or slot[i].key = key
return i
i := (i + 1) modulus numSlots

function lookup(key)
i := findSlot(key)
if slot[i] is occupied // key is in table
return slot[i].value
else // key is not in table
return not found

function set(key, value)


i := findSlot(key)
if slot[i] is occupied
slot[i].value := value
else
if the table is almost full
rebuild the table larger (note 1)
i := findSlot(key)
slot[i].key := key
slot[i].value := value

Another example showing open addressing technique. Presented function is converting each
part(4) of an Internet protocol address, where NOT is bitwise NOT, XOR is bitwise XOR, OR is
bitwise OR, AND is bitwise AND and << and >> are shift-left and shift-right:

// key_1,key_2,key_3,key_4 are following 3-digit numbers - parts of ip address xxx.xxx.xxx.xxx


function ip(key parts)
j := 1
do
key := (key_2 << 2)
key := (key + (key_3 << 7))
key := key + (j OR key_4 >> 2) * (key_4) * (j + key_1) XOR j
key := key AND _prime_ // _prime_ is a prime number
j := (j+1)
while collision
return key
note 1
Rebuilding the table requires allocating a larger array and recursively using the set operation to
insert all the elements of the old array into the new larger array. It is common to increase the
array size exponentially, for example by doubling the old array size.
function remove(key)
i := findSlot(key)
if slot[i] is unoccupied
return // key is not in the table
j := i
loop
j := (j+1) modulus numSlots
if slot[j] is unoccupied
exit loop
k := hash(slot[j].key) modulus numSlots
if (j > i and (k <= i or k > j)) or
(j < i and (k <= i and k > j)) (note 2)
slot[i] := slot[j]
i := j
mark slot[i] as unoccupied
note 2
For all records in a cluster, there must be no vacant slots between their natural hash position and
their current position (else lookups will terminate before finding the record). At this point in the
pseudocode, i is a vacant slot that might be invalidating this property for subsequent records in
the cluster. j is such as subsequent record. k is the raw hash where the record at j would naturally
land in the hash table if there were no collisions. This test is asking if the record at j is invalidly
positioned with respect to the required properties of a cluster now that i is vacant.
Another technique for removal is simply to mark the slot as deleted. However this eventually
requires rebuilding the table simply to remove deleted records. The methods above provide O(1)
updating and removal of existing records, with occasional rebuilding if the high water mark of
the table size grows.

The O(1) remove method above is only possible in linearly probed hash tables with single-slot
stepping. In the case where many records are to be deleted in one operation, marking the slots for
deletion and later rebuilding may be more efficient.

Open addressing versus chaining


Chained hash tables have the following benefits over open addressing:
They are simple to implement effectively and only require basic data structures.From the point of
view of writing suitable hash functions, chained hash tables are insensitive to clustering, only
requiring minimization of collisions. Open addressing depends upon better hash functions to
avoid clustering. This is particularly important if novice programmers can add their own hash
functions, but even experienced programmers can be caught out by unexpected clustering effects.
They degrade in performance more gracefully. Although chains grow longer as the table fills, a
chained hash table cannot "fill up" and does not exhibit the sudden increases in lookup times that
occur in a near-full table with open addressing. If the hash table stores large records, about 5 or
more words per record, chaining uses less memory than open addressing.If the hash table is
sparse (that is, it has a big array with many free array slots), chaining uses less memory than
open addressing even for small records of 2 to 4 words per record due to its external storage.

This graph compares the average number of cache misses required to lookup elements in tables
with chaining and linear probing. As the table passes the 80%-full mark, linear probing's
performance drastically degrades.For small record sizes (a few words or less) the benefits of in-
place open addressing compared to chaining are:

They can be more space-efficient than chaining since they don't need to store any pointers or
allocate any additional space outside the hash table. Simple linked lists require a word of
overhead per element.Insertions avoid the time overhead of memory allocation, and can even be
implemented in the absence of a memory allocator.Because it uses internal storage, open
addressing avoids the extra indirection required for chaining's external storage. It also has better
locality of reference, particularly with linear probing. With small record sizes, these factors can
yield better performance than chaining, particularly for lookups.They can be easier to serialize,
because they don't use pointers.On the other hand, normal open addressing is a poor choice for
large elements, since these elements fill entire cache lines (negating the cache advantage), and a
large amount of space is wasted on large empty table slots. If the open addressing table only
stores references to elements (external storage), it uses space comparable to chaining even for
large records but loses its speed advantage.

Normally open addressing is better used for hash tables with small records that can be stored
within the table (internal storage) and fit in a cache line. They are particularly suitable for
elements of one word or less. In cases where the tables are expected to have high load factors,
the records are large, or the data is variable-sized, chained hash tables often perform as well or
better.

Ultimately, used sensibly any kind of hash table algorithm is usually fast enough; and the
percentage of a calculation spent in hash table code is low. Memory usage is rarely considered
excessive. Therefore, in most cases the differences between these algorithms is marginal, and
other considerations typically come into play.

Coalesced hashing
A hybrid of chaining and open addressing, coalesced hashing links together chains of nodes
within the table itself. Like open addressing, it achieves space usage and (somewhat diminished)
cache advantages over chaining. Like chaining, it does not exhibit clustering effects; in fact, the
table can be efficiently filled to a high density. Unlike chaining, it cannot have more elements
than table slots.

Perfect hashing
If all of the keys that will be used are known ahead of time, and there are no more keys that can
fit the hash table, perfect hashing can be used to create a perfect hash table, in which there will
be no collisions. If minimal perfect hashing is used, every location in the hash table can be used
as well.

Perfect hashing gives a hash table where the time to make a lookup is constant in the worst case.
This is in contrast to chaining and open addressing methods, where the time for lookup is low on
average, but may be arbitrarily large. There exist methods for maintaining a perfect hash function
under insertions of keys, known as dynamic perfect hashing. A simpler alternative, that also
gives worst case constant lookup time, is cuckoo hashing.

Probabilistic hashing
Perhaps the simplest solution to a collision is to replace the value that is already in the slot with
the new value, or slightly less commonly, drop the record that is to be inserted. In later searches,
this may result in a search not finding a record which has been inserted. This technique is
particularly useful for implementing caching.

An even more space-efficient solution which is similar to this is use a bit array (an array of one-
bit fields) for table. Initially all bits are set to zero, and when it insert a key,it is to be set the
corresponding bit to one. False negatives cannot occur, but false positives can, since if the search
finds a 1 bit, it will claim that the value was found, even if it was just another value that hashed
into the same array slot by coincidence. In reality, such a hash table is merely a specific type of
Bloom filter.

Table resizing
With a good hash function, a hash table can typically contain about 70%–80% as many elements
as it does table slots and still perform well. Depending on the collision resolution mechanism,
performance can begin to suffer either gradually or dramatically as more elements are added. To
deal with this, when the load factor exceeds some threshold, we allocate a new, larger table, and
add all the contents of the original table to this new table.

This can be a very expensive operation, and the necessity for it is one of the hash table's
disadvantages. In fact, some naive methods for doing this, such as enlarging the table by one
each time you add a new element, reduce performance so drastically as to make the hash table
useless. However, if we enlarge the table by some fixed percent, such as 10% or 100%, it can be
shown using amortized analysis that these resizings are so infrequent that the average time per
lookup remains constant-time. To see why this is true, suppose a hash table using chaining begins
at the minimum size of 1 and is doubled each time it fills above 100%. If in the end it contains n
elements, then the total add operations performed for all the resizings is:

1 + 2 + 4 + ... + n = 2n - 1.
Because the costs of the resizings form a geometric series, the total cost is O(n). But we also
perform n operations to add the n elements in the first place, so the total time to add n elements
with resizing is O(n), an amortized time of O(1) per element.

On the other hand, some hash table implementations, notably in real-time systems, cannot pay
the price of enlarging the hash table all at once, because it may interrupt time-critical operations.
One simple approach is to initially allocate the table with enough space for the expected number
of elements and forbid the addition of too many elements. Another useful but more memory-
intensive technique is to perform the resizing gradually:

Allocate the new hash table, but leave the old hash table and check both tables during lookups.
Each time an insertion is performed, add that element to the new table and also move k elements
from the old table to the new table.When all elements are removed from the old table, deallocate
it.To ensure that the old table will be completely copied over before the new table itself needs to
be enlarged, it's necessary to increase the size of the table by a factor of at least (k + 1)/k during
the resizing.
Linear hashing is a hash table algorithm that permits incremental hash table expansion. It is
implemented using a single hash table, but with two possible look-up functions.

Another way to decrease the cost of table resizing is to choose a hash function in such a way that
the hashes of most values do not change when the table is resized. This approach, called
consistent hashing, is prevalent in disk-based and distributed hashes, where resizing is
prohibitively costly.

Ordered retrieval issue


Hash tables store data in pseudo-random locations, so accessing the data in a sorted manner is a
very time consuming operation. Other data structures such as self-balancing binary search trees
generally operate more slowly (since their lookup time is O(log n)) and are rather more complex
to implement than hash tables but maintain a sorted data structure at all times. See a comparison
of hash tables and self-balancing binary search trees.

Problems with hash tables


Although hash table lookups use constant time on average, the time spent can be significant.
Evaluating a good hash function can be a slow operation. In particular, if simple array indexing
can be used instead, this is usually faster.

Hash tables in general exhibit poor locality of reference—that is, the data to be accessed is
distributed seemingly at random in memory. Because hash tables cause access patterns that jump
around, this can trigger microprocessor cache misses that cause long delays. Compact data
structures such as arrays, searched with linear search, may be faster if the table is relatively small
and keys are cheap to compare, such as with simple integer keys. According to Moore's Law,
cache sizes are growing exponentially and so what is considered "small" may be increasing. The
optimal performance point varies from system to system; for example, a trial on Parrot shows
that its hash tables outperform linear search in all but the most trivial cases (one to three entries).

More significantly, hash tables are more difficult and error-prone to write and use. Hash tables
require the design of an effective hash function for each key type, which in many situations is
more difficult and time-consuming to design and debug than the mere comparison function
required for a self-balancing binary search tree. In open-addressed hash tables it's even easier to
create a poor hash function.

Additionally, in some applications, a black hat with knowledge of the hash function may be able
to supply information to a hash which creates worst-case behavior by causing excessive
collisions, resulting in very poor performance (i.e., a denial of service attack). In critical
applications, either universal hashing can be used or a data structure with better worst-case
guarantees may be preferable.

Other hash table algorithms


Extendible hashing and linear hashing are hash algorithms that are used in the context of
database algorithms used for instance in index file structures, and even primary file organization
for a database. Generally, in order to make search scalable for large databases, the search time
should be proportional log N or near constant, where N is the number of records to search. Log N
searches can be implemented with tree structures, because the degree of fan out and the shortness
of the tree relates to the number of steps needed to find a record, so the height of the tree is the
maximum number of disc accesses it takes to find where a record is. However, hash tables are
also used, because the cost of a disk access can be counted in units of disc accesses, and often
that unit is a block of data. Since a hash table can, in the best case, find a key with one or two
accesses, a hash table index is regarded as generally faster when retrieving a collection of records
during a join operation e.g.

SELECT * from customer, orders where customer.cust_id = orders.cust_id and cust_id = X


i.e. If orders has a hash index on cust_id, then it takes constant time to locate the block that
contains record locations for orders matching cust_id = X. (although, it would be better if the
value type of orders was a list of order ids, so that hash keys are just one unique cust_id for each
batch of orders, to avoid unnecessary collisions).

Extendible hashing and linear hashing have certain similarities: collisions are accepted as
inevitable and are part of the algorithm where blocks or buckets of collision space is added ;
traditional good hash function ranges are required, but the hash value is transformed by a
dynamic address function : in extendible hashing, a bit mask is used to mask out unwanted bits,
but this mask length increases by one periodically, doubling the available addressing space ; also
in extendible hashing, there is an indirection with a directory address space, the directory entries
being paired with another address (a pointer ) to the actual block containing the key-value pairs;
the entries in the directory correspond to the bit masked hash value (so that the number of entries
is equal to maximum bit mask value + 1 e.g. a bit mask of 2 bits, can address a directory of 00 01
10 11, or 3 + 1 = 4).

In linear hashing, the traditional hash value is also masked with a bit mask, but if the resultant
smaller hash value falls below a 'split' variable, the original hash value is masked with a bit mask
of one bit greater length, making the resultant hash value address recently added blocks. The split
variable ranges incrementally between 0 and the maximum current bit mask value e.g. a bit mask
of 2, or in the terminology of linear hashing, a "level" of 2, the split variable will range between
0 to 3. When the split variable reaches 4, the level increases by 1, so in the next round of the split
variable, it will range between 0 to 7, and reset again when it reaches 8.

The split variable incrementally allows increased addressing space, as new blocks are added; the
decision to add a new block occurs whenever a key-and=value is being inserted, and overflows
the particular block the key-and-value's key hashes into. This overflow location may be
completely unrelated to the block going to be split pointed to by the split variable. However, over
time, it is expected that given a good random hash function that distributes entries fairly evenly
amongst all addressable blocks, the blocks that actually require splitting because they have
overflowed get their turn in round-robin fashion as the split value ranges between 0 - N where N
has a factor of 2 to the power of Level, level being the variable incremented whenever the split
variable hits N.

New blocks are added one at a time with both extendible hashing, and with linear hashing.

In extendible hashing, a block overflow ( a new key-value colliding with B other key-values,
where B is the size of a block) is handled by checking the size of the bit mask "locally", called
the "local depth", an attribute which must be stored with the block. The directory structure, also
has a depth, the "global depth". If the local depth is less than the global depth, then the local
depth is incremented, and all the key values are rehashed and passed through a bit mask which is
one bit longer now, placing them either in the current block, or in another block. If the other
block happens to be the same block when looked up in the directory, a new block is added, and
the directory entry for the other block is made to point to the new block. Why does the directory
have entries where two entries point to the same block ? This is because if the local depth is
equal to the global depth of the directory, this means the bit mask of the directory does not have
enough bits to DEAL with an increment in the bit mask length of the block, and so the directory
must have its bit mask length incremented, but this means the directory now doubles the number
of addressable entries. Since half the entries addressable don't exist, the directory simply copies
the pointers over to the new entries e.g. if the directory had entries for 00, 01, 10, 11, or a 2 bit
mask, and it becomes a 3 bit mask, then 000 001 010 011 100 101 110 111 become the new
entries, and 00's block address go to 000 and 001 ; 01's pointer goes to 010 and 011, 10 goes to
100 and 101 and so on. And so this creates the situation where two directory entries point to the
same block. Although the block that was going to overflow, now can add a new block by
redirecting the second pointer to a newly appended block, the other original blocks will have two
pointers to them. When it is their turn to split, the algorithm will check local vs global depth and
this time find that the local depth is less, and hence no directory splitting is required, only a new
block be appended, and the second directory pointer moved from addressing the previous block
to addressing the new block.

In linear hashing, adding a similarly hashed block does not occurs immediately when a block
overflows, and therefore an overflow block is created to be attached to the overflowing block.
However, a block overflow is a signal that more space will be required, and this happens by
splitting the block pointed to by the "split" variable, which is initially zero, and hence initially
points to block zero. The splitting is done by taking all the key-value pairs in the splitting block,
and its overflow block(s), hashing the keys again, but with a bit mask of length current level + 1.
This will result in two block addresses, some will be the old block number, and others will be

a2 = old block number + ( N times 2 ^ (level) )

Rationale
Let m = N times 2 ^ level ; if h is the original hash value, and old block number = h mod m, and
now the new block number is h mod ( m * 2 ), because m * 2 = N times 2 ^ (level+1), then the
new block number is either h mod m if (h / m) is even so dividing h/m by 2 leaves a zero
remainder and therefore doesn't change the remainder, or the new block number is ( h mod m ) +
m because h / m is an odd number, and dividing h / m by 2 will leave an excess remainder of m,
+ the original remainder. ( The same rationale applies to extendible hashing depth
incrementing ).

As above, a new block is created with a number a2, which will usually occur at +1 the previous
a2 value. Once this is done, the split variable is incremented, so that the next a2 value will be
again old a2 + 1. In this way, each block is covered by the split variable eventually, so each block
is preemptively rehashed into extra space, and new blocks are added incrementally. Overflow
blocks that are no longer needed are discarded, for later garbage collection if needed, or put on
an available free block list by chaining.

When the split variable reaches ( N times 2 ^ level ), level is incremented and split variable is
reset to zero. In this next round, the split variable will now traverse from zero to ( N times 2 ^
(old_level + 1 ) ), which is exactly the number of blocks at the start of the previous round, but
including all the blocks created by the previous round.

A simple inference on file storage mapping of linear hashing and extendible hashing[edit]
As can be seen, extendible hashing requires space to store a directory which can double in size.

Since the space of both algorithms increase by one block at a time, if blocks have a known
maximum size or fixed size, then it is straight forward to map the blocks as blocks sequentially
appended to a file.

In extendible hashing, it would be logical to store the directory as a separate file, as doubling can
be accommodated by adding to the end of the directory file. The separate block file would not
have to change, other than have blocks appended to its end.

Header information for linear hashing doesn't increase in size : basically just the values for N,
level, and split need to be recorded, so these can be incorporated as a header into a fixed block
size linear hash storage file.
However, linear hashing requires space for overflow blocks, and this might best be stored in
another file, otherwise addressing blocks in the linear hash file is not as straight forward as
multiplying the block number by the block size and adding the space for N,level, and split.

4.4 SORTING

 Sorting is nothing but systematic arrangement of the data based. The systematic arrangement
means based on some key the data should be arranged.
 Types of sorting
 Internal sorting
 External sorting
 Internal Sorting:
Sort can be done in main memory, so that the number of elements is relatively small (less
than a million).

 External Sorting
Sorts that cannot be performed in main memory and must be done on disk or tape are also
quite important. This type of sorting, known as external sorting

Topics covered:

 Bubble sort
 Insertion sort
 Selection sort
 Shell sort
 Heap sort
 Merge sort
 Quick sort
 Radix sort
4.5 Bubble sort:

 Bubble is a simple and well-known sorting algorithm.


 It is used in practice once in a blue moon and its main application is to make an introduction to
the sorting algorithm.

Algorithm:

1. Compare each pair of adjacent elements from the beginning of an array, if they are in
reversed order, swap them.
2. It at least one swap has been done, repeat step 1
Example. Sort {5, 1, 12, -5, 16} using bubble sort.

Bubble Sort program:

/***** Program to Sort an Array using Bubble Sort *****/

#include <stdio.h>
#include<conio.h>
void bubble_sort();
int a[10], n;
void main()
{
int i;
printf("\n Enter size of an array: ");
scanf("%d", &n);
printf("\n Enter elements of an array:\n");
for(i=0; i<n; i++)
scanf("%d", &a[i]);
bubble_sort();
printf("\n\nAfter sorting:\n");
for(i=0; i<n; i++)
printf("\n%d", a[i]);
getch();
}

void bubble_sort()
{
int i, j, temp;
for(i=0; i<n; i++)
{
printf( “ Pass -> %d”,i+1);
for(j=0; j<(n-1)-j; j++)
{
if(a[j] > a[j+1])
{
temp = a[j];
a[j] = a[j+1];
a[j+1] = temp;
}
}
}
}
Implementation

 The efficiency of a sorting algorithm is measured in terms of number of comparisons.


 In bubble sort, there are n–1 comparisons in Pass 1, n–2 comparisons in Pass 2, and so on.
 Total number of comparisons = (n–1) + (n –2) + (n –3) +… +3+2+1= n(n – 1)/2.
 n(n – 1)/2 is of O(n2) order. Therefore, the bubble sort algorithm is of the order O(n2).
 Average and worst case complexity of bubble sort is O(n 2). Also, it makes O(n2) swaps in the
worst case.
 Avoid implementations, which don't check if the array is already sorted on every step (any swaps
made). This check is necessary, in order to preserve adaptive property.
 One more problem of bubble sort is that its running time badly depends on the initial order of the
elements.
 Big elements (rabbits) go up fast, while small ones (turtles) go down very slow.

4.6 Selection sort:

The idea of algorithm is quite simple. Array is imaginary divided into two parts - sorted
one and unsorted one. At the beginning, sorted part is empty, while unsorted one contains whole
array. At every step, algorithm finds smallest element in the unsorted part and adds it to the end of
the sorted one. When unsorted partbecomes empty, algorithm stops.

When algorithm sorts an array, it swaps first element of unsorted part with minimal
element and then it is included to the sorted part. This implementation of selection sort in not stable.
In case of linked list is sorted, and, instead of swaps, minimal element is linked to the unsorted part,
selection sort is stable.

The algorithm work as follows:

1. Set the first position as current position


2. Find the minimum value in the list
3. Swap it with the value in the current position
4. Set next position as current position
5. Repeat steps 2- 4 until you reach end of list.

Let us see an example of sorting an array to make the idea of selection sort clearer.

Example. Sort {5, 1, 12, -5, 16, 2, 12, 14} using selection sort.
Efficiency of selection Sort:
 Selection sort stops, when unsorted part becomes empty. As we know, on every step
number of unsorted elements decreased by one.
 There are n–1 comparisons during Pass 1 to find the smallest element, n–2 comparisons
during Pass 2 to find the second smallest element, and so on.
 Total number of comparisons = (n – 1) + (n – 2) + (n – 3) + … + 3 + 2 + 1 = n(n – 1)/2
 n(n – 1)/2 is of O(n2) order. Therefore, the selection sort algorithm is of the order O(n2).
 But the Fact, that selection sort requires n - 1 number of swaps at most, makes it very
efficient in situations, when write operation is significantly more expensive, than read
operation.

Implementation Selection Sort :


/***** Program to Sort an Array using Selection Sort *****/

#include <stdio.h>
#include <conio.h>
void selection_sort();
int a[10], n;
void main()
{
int i;
printf("\nEnter size of an array: ");
scanf("%d", &n);
printf("\nEnter elements of an array:\n");
for(i=0; i<n; i++)
scanf("%d", &a[i]);
selection_sort();
printf("\n\nAfter sorting:\n");
for(i=0; i<n; i++)
printf("\n%d", a[i]);
getch();
}
void selection_sort()
{
int i, j, min, temp;
for (i=0; i<n; i++)
{
min = i;
for (j=i+1; j<n; j++)
{
if (a[j] < a[min])
min = j;
}
temp = a[i];
a[i] = a[min];
a[min] = temp;
}
}

4.7 Insertion Sort:

 Insertion sort algorithm somewhat resembles selection sort.


 Here, sorting takes place by inserting a particular element at the appropriate position that’s why
the name – insertion sorting
 Array is imaginary divided into two parts - sorted one and unsorted one.
 At the beginning, sorted part contains first element of the array and unsorted one contains the
rest.
 At every step, algorithm takes first element in the unsorted part and inserts it to the right place
of the sorted one.
 When unsorted part becomes empty, algorithm stops. Sketchy, insertion sort algorithm step
looks like this:
 This procedure is repeated for all the elements in the list.
becomes

Let us see an example of insertion sort routine to make the idea of algorithm clearer.

Example. Sort {7, -5, 2, 16, 4} using insertion sort.


The ideas of insertion

The main operation of the algorithm is insertion. The task is to insert a value into the
sorted part of the array. Let us see the variants of how we can do it.
Shifting instead of swapping

It will write sifted element only to the final correct position. Let us see an illustration.

It is the most commonly used modification of the insertion sort.

Using binary search

It is reasonable to use binary search algorithm to find a proper place for insertion. This
variant of the insertion sort is called binary insertion sort. After position for insertion is found,
algorithm shifts the part of the array and inserts the element. This version has lower number of
comparisons, but overall average complexity remains O(n2). From a practical point of view this
improvement is not very important, because insertion sort is used on quite small data sets.

Algorithm:

Insert (Arr, N) where ARR is an array of N elements.


Step1: Repeat for p = 1,2,3,….N
Step2: Assign temp = arr[p]
Step3: Repeat for j=p to 0 and Arr[j-1]>temp
arr[j]=arr[j-1]
else goto step 6
Step4: Decrement j by 1
Step5: [End of step3 for loop]
Step6: Assign Arr[j]=temp
Step7: [End of step1 for loop]
Step8: print the sorted array ARR
Insertion Sort Program:

/**** Program to Sort an Array using Insertion Sort ****/

#include <stdio.h>
#include<conio.h>
void insertion_sort();

int a[10],n;
void main()
{
int i;
printf("\nEnter size of an array: ");
scanf("%d", &n);
printf("\nEnter elements of an array:\n");
for(i=0; i<n; i++)
scanf("%d", &a[i]);
insertion_sort();
printf("\n\nAfter sorting:\n");
for(i=0; i<n; i++)
printf("\n%d", a[i]);
getch();
}

void insertion_sort()
{
int i, j, temp;
for(i=1; i<n; i++)
{
temp = a[i];
j = i-1;
while (j>=0 && a[j]>temp)
{
a[j+1] = a[j];
j--;
}
a[j+1] = temp;
}
}

4.8 Shell sort

 Shell sort is an extension of insertion sort


 Limitation of insertion sort
It computes only the consecutive elements and interchanges the elements by only one
space. The smaller elements that are fast array require many passes through the sort to property
insert them in its correct position.

 Shell sort overcomes this limitation by computing elements that are at a specific distance from
each other and interchanges them if necessary.
 Shell sort divides the list into smallest sub lists and then sorts sub lists separately using insertion
sort.
 This method splits the input list into h-in depending sorted list.
 The value of h will be initially high and is repeatedly decremented until it reaches 1.
 When h is equal to 1. A regular insertion self is performed on the list, but by then the list is
guaranteed to be almost sorted.
 In the next pass, it takes all the elements and sorts the entire list.
 Shell sort is also called as diminishing insertion sort, because the number of elements compared
in a group continuously decreases.

o Select the distance by which the elements in a group will be separated to form multiple
sublists.
o Apply insertion sort on each sublist to move the elements towards their correct positions.
Apply insertion sort to sort the three lists
Increment = 2 Pass = 2

Apply insertion sort on each sublist


 Shell sort improves insertion sort by comparing the elements separated by a distance of
several positions. This helps an element to take a bigger step towards its correct position,
thereby reducing the number of comparisons.

Shell sort routine:

#include <stdio.h>
#include<conio.h>
void shell_sort();
int a[10],n;
void main()
{
int i;
printf("\nEnter size of an array: ");
scanf("%d", &n);
printf("\nEnter elements of an array:\n");
for(i=0; i<n; i++)
scanf("%d", &a[i]);
shell_sort();
printf("\n\nAfter sorting:\n");
for(i=0; i<n; i++)
printf("\n%d", a[i]);
getch();
}
void shell (int a[], int n)
{
for(i=(n+1)/2;i≥1;i/=2)
{
for{j=1;j<=n-1;j++)
{
temp=a[j];
k=j-i;
while(k≥0&&temp<a[k])
{
a[k+i]=a[k];
k=k- i;
}
a[k+i]=temp;
}
}
}

4.9 Radix sort:

 Radix sort is a clever and intuitive little sorting algorithm.


 Radix sort puts the elements in order by comparing the Digits of the numbers.
Ex: Consider the following 9 numbers
493, 812, 715, 710, 195, 437, 582, 340, 385
 We should start sorting by comparing and ordering the One’s digits:
Digit Number (submit)

0 – 340, 710
1-
2 – 812, 582
3 - 493
4
5 – 715, 195, 385
6
7 - 437
8
9
Now, we gather the all numbers (sub list) in order from the 0 sub list to 0 sub list) into the
main list again. 340,710,812,582,493,715,195,385,437

 Now, the sub lists are creating again this time based on Ten’s digit:

Digit Number (submit)

0-
1 – 710, 812, 715
2
3 - 437
4 - 340
5-
6-
7
8 – 582, 385
9 – 493, 195

 Now, the sub lists (number) are gathered in order from 0 to 9


710, 812, 715,437, 340, 582, 385, 493, 195
 Finally, the sub lists (number) are created according to the Hundred’s digit:
Digit Number (submit)

0-
1 – 195
2
3 – 340, 385
4 – 437, 493
5 - 582
6-
7 - 710, 715
8 – 812

 At last, the list is gathered up again:
195, 340, 385, 437, 493, 582, 710, 715, 812
 And now we have a fully sorted array Radix sort is very simple, and a computer can do it fast.
When it is programmed properly, Radix sort is in fact one of the fastest sorting algorithms for
numbers or string of letters.
 It is array A contains 1, 13, 24, 26 and B contains 2, 15, 27, 38 then the algorithm proceeds as
follows: First a comparison is done between 1 and 2. 1 is added to C and then 13 and 2 are
compared.
 Next 2 is added to C, and then 13 and 15 are compared.
 13 is added to C, and then 24 and 15 are compared. This proceeds until 26 and 27 are compared.

4.10 Heap Sort:

 Consider the input sequence 31,41,59,26,53,58,97


 Heap sort can be implemented using two types of techniques.
 Maxheap:
In maxheap, the root node will be greater than their children nodes.
 Min heap:
In minheap, the root node will be smaller than their children nodes.
Maxheap:
In heap Sort, the root node is compared with the last right most leaf node and replace with root
node and write the root node as output.
Heap Sort
#include<stdio.h>
#include<conio.h>
int hsort[25],n,i;
void adjust(int,int);
void heapify();
void main()
{
int temp;
clrscr();
printf("\n\t\t\t\tHEAP SORT");
printf("\n\t\t\t\t**** ****\n\n\n");
printf("\nenter no of elements:");
scanf("%d",&n);
printf("\nenter elements to be sorted\n\n");
for(i=1;i<=n;i++)
scanf("%d",&hsort[i]);
heapify();
for(i=n;i>=2;i--)
{
temp=hsort[1];
hsort[1]=hsort[i];
hsort[i]=temp;
adjust(1,i-1);
}
printf("\nSORTED ELEMENT\n\n");
for(i=1;i<=n;i++)
printf("%d\n",hsort[i]);
getch();
}
void heapify()
{
int i;
for(i=n/2;i>=1;i--)
adjust(i,n);
}
void adjust(int i,int n)
{
int j,element;
j=2*i;
element=hsort[i];
while(j<=n)
{
if((j<n)&&(hsort[j]<hsort[j+1]))
j=j++;
if(element>=hsort[j])
break;
hsort[j/2]=hsort[j];
j=2*j;
}
hsort[j/2]=element;
4.11 Quicksort

Quicksort is a fast sorting algorithm, which is used not only for educational purposes, but
widely applied in practice. On the average, it has O(n log n) complexity, making quicksort suitable
for sorting big data volumes. The idea of the algorithm is quite simple and once you realize it, you
can write quicksort as fast as bubble sort.

Algorithm

The divide-and-conquer strategy is used in quicksort. Below the recursion step is described:
1. Choose a pivot value. We take the value of the middle element as pivot value, but it can
be any value, which is in range of sorted values, even if it doesn't present in the array.
2. Partition. Rearrange elements in such a way, that all elements which are lesser than the
pivot go to the left part of the array and all elements greater than the pivot, go to the right
part of the array. Values equal to the pivot can stay in any part of the array. Notice, that
array may be divided in non-equal parts.
3. Sort both parts. Apply quicksort algorithm recursively to the left and the right parts.

Partition algorithm in detail

There are two indices i and j and at the very beginning of the partition algorithm. i points
to the first element in the array and j points to the last one. Then algorithm moves i forward, until
an element with value greater or equal to the pivot is found. Index j is moved backward, until an
element with value lesser or equal to the pivot is found. If i ≤ j then they are swapped and i steps
to the next position (i + 1), j steps to the previous one (j - 1). Algorithm stops, when i becomes
greater than j. After partition, all values before ith element are less or equal than the pivot and all

values after jth element are greater or equal to the pivot.

Example. Sort {1, 12, 5, 26, 7, 14, 3, 7, 2} using quicksort.


Notice, that we show here only the first recursion step, in order not to make example too
long. But, in fact, {1, 2, 5, 7, 3} and {14, 7, 26, 12} are sorted then recursively.
Why does it work?
On the partition step algorithm divides the array into two parts and every element a from
the left part is less or equal than every element b from the right part. Also a and b satisfy a ≤
pivot ≤ b inequality. After completion of the recursion calls both of the parts become sorted and,
taking into account arguments stated above, the whole array is sorted.

4.12 Merge Sort


Merge sort algorithm:

1. It is based on the divide and conquer approach


2. Divides the list into two sublists of sizes as nearly equal as possible
3. Sorts the two sublists separately by using merge sort
4. Merges the sorted sublists into one single list
 To understand the implementation of merge sort algorithm, consider an unsorted list of numbers
stored in an array.

 Let us sort this unsorted list.


 The first step to sort data by using merge sort is to split the list into two parts.

 The list has odd number of elements, therefore, the left sublist is longer than the right sublist by
one entry.

 There is a single element left in each sublist.


 Sublists with one element require no sorting.

 Start merging the sublists to obtain a sorted list.

 Further merge the sublists.

 Again, merge the sublists.

 To sort the list by using merge sort algorithm, you need to recursively divide the list into two
nearly equal sublists until each sublist contains only one element.
 To divide the list into sublists of size one requires log n passes.
 In each pass, a maximum of n comparisons are performed.
 Therefore, the total number of comparisons will be a maximum of n × log n.
 The efficiency of merge sort is equal to O(n log n)
 There is no distinction between best, average, and worst case efficiencies of merge sort because

all of them require the same amount of time.


MERGE SORT Program

/**** Program to Sort an Array using Merge Sort ****/

#include <stdio.h>
void merge_sort(int [], int, int);
void merge_array(int [], int, int, int);
main()
{
int a[50], n, i;
printf("\nEnter size of an array: ");
scanf("%d", &n);
printf("\nEnter elements of an array:\n");
for(i=0; i<n; i++)
scanf("%d", &a[i]);
merge_sort(a, 0, n-1);
printf("\n\nAfter sorting:\n");
for(i=0; i<n; i++)
printf("\n%d", a[i]);
getch();
}

void merge_sort(int a[], int beg, int end)


{
int mid;
if (beg < end)
{
mid = (beg+end)/2;
merge_sort(a, beg, mid);
merge_sort(a, mid+1, end);
merge_array(a, beg, mid, end);
}
}

void merge_array(int a[], int beg, int mid, int end)


{
int i, left_end, num, temp, j, k, b[50];
for(i=beg; i<=end; i++)
b[i] = a[i];
i = beg;
j = mid+1;
k = beg;
while ((i<=mid) && (j<=end))
{
if (b[i] <= b[j])
{
a[k] = b[i];
i++;
k++;
}
else
{
a[k] = b[j];
j++;
k++;
}
}
if (i <= mid)
{
while (i <= mid)
{
a[k] = b[i];
i++;
k++;
}
}
else
{
while (j <= end)
{
a[k] = b[j];
j++;
k++;
}
}
}

4.13 Searching- Linear search, Binary search

One of the more important tasks performed by computers is the location and retrieval of data.

For data held in arrays there are a number of possibilities, and one of these is a simple technique
referred to as linear search.

A second approach, known as binary search, will be discussed in the next section.

These sections assume that there is no duplication of data within the data set, but the techniques
can be extended to cover data sets that do contain duplicates.

The item being searched for will be referred to as the target.

Linear Search

To perform a linear search of data held in an array, the search starts at one end (usually
the low numbered element of the array) and examines each element in the array until one of two
conditions is met., either Condition 1: the target has been found or Condition 2: the end of the
data has been reached (the target value is not in the data set).
Note that the algorithm requires that both tests are performed and that the search
terminates when one of the conditions becomes true. The second test is required to prevent the
algorithm from attempting to search past the end of the data. For illustration, consider the
following data set. Again, element 0 is leftmost:

To search for the value 7 in the array, we start by examining the first element of the array.
This does not match the target, so we increment the index counter, and try again. We now
examine the next element of the array, which has the value of 23. This does not match the target,
so we again increment the counter. This now means that we are examining the element
containing 7. This is what we are looking for, so the search is terminated, and the result of the
search is reported back to the calling function. It is usual to return the index of the element
containing the target, but there may be circumstances where a different return value may be
needed.

In Linear Search the list is searched sequentially and the position is returned if the key
element to be searched is available in the list, otherwise -1 is returned. The search in Linear
Search starts at the beginning of an array and move to the end, testing for a match at each item.

All the elements preceding the search element are traversed before the search element is
traversed. i.e. if the element to be searched is in position 10, all elements form 1-9 are checked
before 10.

Algorithm : Linear search implementation


bool linear_search ( int *list, int size, int key, int* rec )
{
// Basic Linear search
bool found = false;
int i;
for ( i = 0; i < size; i++ )
{
if ( key == list[i] )
break;
}
if ( i < size )
{
found = true;
rec = &list[i];
}
return found;
}
The code searches for the element through a loop starting from 0 to n. The loop can
terminate in one of two ways. If the index variable i reach the end of the list, the loop condition
fails. If the current item in the list matches the key, the loop is terminated early with a break
statement. Then the algorithm tests the index variable to see if it is less than that size (thus the
loop was terminated early and the item was found), or not (and the item was not found).

Ex.
Assume the element 45 is searched from a sequence of sorted elements 12, 18, 25, 36, 45,
48, 50. The Linear search starts from the first element 12, since the value to be searched is not 12
(value 45), the next element 18 is compared and is also not 45, by this way all the elements
before 45 are compared and when the index is 5, the element 45 is compared with the search
value and is equal, hence the element is found and the element position is 5.
In a linear search the search is done over the entire list even if the element to be searched
is not available. Some of our improvements work to minimize the cost of traversing the whole
data set, but those improvements only cover up what is really a problem with the algorithm.

By thinking of the data in a different way, we can make speed improvements that are
much better than anything linear search can guarantee. Consider a list in sorted order. It would
work to search from the beginning until an item is found or the end is reached, but it makes more
sense to remove as much of the working data set as possible so that the item is found more
quickly.

If it started at the middle of the list it could determine which half the item is in (because
the list is sorted). This effectively divides the working range in half with a single test. This in
turn reduces the time complexity.

Algorithm:
bool Binary_Search ( int *list, int size, int key, int* rec )

bool found = false;

int low = 0, high = size - 1;

while ( high >= low )

int mid = ( low + high ) / 2;

if ( key < list[mid] )


high = mid - 1;

else

if ( key > list[mid] )

low = mid + 1;

else

found = true;

rec = &list[mid];

break;

return found;

Binary Search

Binary search is also known as binary chop, as the data set is cut into two halves for each
step of the process. It is a very much faster search method than linear search, but to be effective
the data set must be in sorted order in the array. If the data set changes rapidly and requires
regular re-sorting then this will offset the speed gain offered by binary search over linear search.

To perform binary search, three index variables are required. By tradition these are called 'top',
'middle' and 'bottom'.

Top is initialized to one end of the array, often 0, and bottom is set to indicate the other end of the
array.

Once these two variables are set, the value of middle can be computed. Middle is set to the
midway value between top and bottom.

The value indexed by middle is compared with the target value. There are initially three possible
outcomes that we have to consider:
1: The value indexed by middle matches the target. In this case the search has found the
target and the function can return a value indicating that the search has succeeded.

2. The value is higher than the middle value in which case only the values from middle to
end need to be searched

3. The value is lower than the middle value in which case the search is carried out
between zero and the middle value.

Binary Search: Illustration

To illustrate this process, consider the following scenario - the data in the array is sorted
and the target is 29. We start by setting top to 9, bottom to 0, and calculating middle to be (0 +
9) / 2. This round down to 4 using C integer arithmetic, so middle is set to 4.

From this we can conclude that the target value is in the lower half of the table. This
means that top must be set to middle and a new value of middle calculated. In this case the value
of middle will be (4 + 0)/2 which C will deliver as 2. The contents of array element 2 match the
target, so in this case the search is successfully concluded.
Binary Search: Practical Issues

A couple of practical issues present themselves at this point:

 If top == bottom then the search has concluded. Unless the value at middle (and middle
must be the same as top and bottom) is the target, then the search has determined that the
target is not in the data set.

 Unless some care is taken then the search may end in a loop with top equal to (bottom +
1). There are circumstances where this loop does not terminate.
EXERCISES
PART A

1. Define Heap.
2. What is the need for Priority queue?
3. What are the properties of binary heap?
4. What do you mean by structure property in a heap?
5. What do you mean by heap order property?
6. What are the applications of priority queues?
7. What do you mean by the term “Percolate up”?
8. What do you mean by the term “Percolate down”?
9. What are the methods available in storing sequential files?
10. List some popular sorting methods.
11. What is the complexity of bubble sort?
12. What is insertion sort?
13. What is the complexity of insertion sort?
14. What is bucket sort?
15. Give the complexity of bucket sort.
16. What is merge sort?
17. What is quick sort algorithm?
18. What is the complexity of quick sort algorithm?
19. What is selection sort?
PART- B
1. What is a Binary heap? Explain binary heap?
2. Explain B-tree representation?
3. What is a Priority Queue? What are its types? Explain.
4. Sort the given values using Quick Sort?
65 70 75 80 85 60 55 50 45
5. Explain bubble sort.
6. Explain the procedure for insertion sort.
7. Write the algorithm for bucket sort.
8. Explain the procedure of merge sort.
9. Write the process of selection sort.
UNIT – IV & V

GRAPHS
Graph Terminologies - Representations of Graphs, Topological sort, Minimum Spanning
Trees- Growing a minimum spanning tree - The algorithms of Kruskal and Prim-Shortest
paths in directed acyclic graphs, Dijkstra's algorithm ,All Pairs Shortest Paths - The Floyd
- Warshall algorithm,Breadth-first search, Depth-first search, strongly connected
components.
5.1 Graphs and its terminologies

Graph is another important non-linear data structure. In tree Structure, there is a


hierarchical relationship between, parent and children that is one-to-many relationship. In Graph,
relationship is less restricted. Here, relationship is many-to-many

Application of Graph Structure in real world:-

 Airlines
 Source – destination network
 Transportation problem

Terminologies:
A graph has a set of vertices V, often labeled V1, V2 . . . etc and a set of edges E, labeled
e1, e2… Each edge is a pair (U,V) of vertices. In general, Graph G = (V, E) for the graph with
vertex set V and edge set E.
In applications, where pair (u, v) is distinct from pair (v, u) the graph is directed.
Otherwise the graph is undirected.

5.2 Representation of graphs

Digraph:

A digraph is also called as directed graph. It is a graph G, Such that G=<V,E>, where V is
the set of ordered pair of elements from V.

V = {V1, V2, V3, V4}


E={(V1, V2),(V1, V3),(V2, V3),(V3, V4),(V4, V1)}

Here if an ordered pair (vi, vj) in E, then there is an

edge directed from vi to vj.

Undirected Graph:
The pair (Vi,Vj)
is unordered,
that is (Vi,Vj)
and (Vj,Vi) are
the same edges.
Weighted Graph:

A graph (or digraph) is termed weighted graph if all the edges in it are labeled with some
weights

Adjacent vertices:

A vertex Vi is adjacent to another vertex Vj if there is an edge from Vi to Vj.

Here, V2 is adjacent to V3 and V4.

Self loop:

If there is an edge, Whose starting and end vertices are the same, that is (V i,Vi) is an edge,
then it is called a self loop.
Data Structures

Parallel edges:-

If there is more than one edge between the same pair of vertices, then they are known as
parallel edges.

E.g.:- In above graph, V1 to V2 has Two parallel edges.


A graph which has either self loop or parallel edges or both is called multigraph.

Simple graph:

A graph (digraph) if it does not have any self loop or parallel edges is called simple graph.

Complete graph:

A graph G is said to be complete if each vertex V i is adjacent to every other are edges from
any vertex to all other vertices.
Data Structures

Acyclic graph:

If there is a path containing one or more edges which starts from a vertex vi and terminates
into the same vertex then the path is known as a cycle.

If a graph does not have any cycle, then it is called acyclic graph.

Isolated vertex:

A vertex is isolated if there is no edge connected from any other vertex to the vertex
Data Structures

Degree of vertex:

The number of edges connected with vertex vi is called the degree of vertex vi and is denoted

by degree (Vi).
For digraph, there are two degrees:

degree (Vi) = 3.
In degree of Vi = number of edges incident to Vi.
Out degree of Vi = number of edges emanating from Vi.

indegree (V1) = 1
outdegree (V1) = 2
Data Structures

Representation of Graphs:-

1. Set representation
2. Linked representation
3. Matrix representation

Set representation:

V = {V1, V2, V3, V4, V5, V6, V7}


E = { (V1,V2),(V1,V3), (V2,V4), (V3,V4),(V2,V5), (V5,V7), (V4,V7),(V6,V7),
(V3,V6) }

5.3 Topological sorting:-

Topological sorting is an ordering of the vertices of a graph, Such that if there is a path from
U to V in the graph then U appears before V in the ordering. A simple algorithm to find a topological
ordering is to find out any vertex with in degree zero, that is, a vertex without any predecessor. We
can then add this vertex in an ordering set (initially which is empty) and remove it along with its
edges from the graph. Then we repeat the same strategy on the remaining graph until it is empty.

Example:
Data Structures
Given a graph G:
Visited vertices: V1, V2, V5, V7, V6, V4, V3.

Here V1has no incoming edges. Visit V1 first and remove all the edges connected to it.

Here V2 has no in degree. Visit V2 next and remove all edged connected to V2.
Data Structures

Here V5 has no in degree. Visit V5 next.

Here V7 has no in degree. Visit V7 next

Here V6 has no in degree. Visit V6 next.

V4 has no in degree. Visit V4 next.

V3 is visited finally.
Visited vertices: V1, V2, V5, V7, V6, V4, V3.
Data Structures

5.4 Minimum Spanning Tree –Growing Phase :

It is related to weighted graph, where we find a spanning tree so that the sum of all the
weights of all the edges in the tree is minimum.

A Spanning Tree:

A Spanning Tree of a connected graph is its connected acyclic sub graph that contains all the
vertices of the graph
A Minimum Spanning Tree of a weighted connected graph is its spanning tree of the smallest
weight, where the weight of a tree is defined as the sum of the weights on all its edges.
The minimum spanning tree problem is the problem of finding a minimum spanning tree for a
given weighted connected graph.

Consider a graph:

The spanning Tree is:

Here T1 is the minimum spanning tree.


Data Structures

Drawback of Exhaustive Search Approach:-

1. The number of spanning trees grows exponentially with the graph size.
2. Generating all spanning trees for a given graph is not easy.
To overcome this drawback, we make use of some efficient algorithms such as

 Prim’s algorithm
 Kruskal’s algorithm
Prim’ s Algorithm:

Consider the graph.

Tree Vertices Remaining Vertices Illustration

a(_,_) b(a , 3) , c( _ , ),
d( _ , ) , e(a , 6),
f(a,5)

b(a,3) c(b,1),d(_, )
e(a,6), f(b ,4)
Data Structures

c(b,1) d ( c , 6 ) , e ( a , 6 ),
f(b,4)

f(b,4) d(f,5),e(f,2)
Data Structures
e(f,2)

Minimum spanning tree:

According to prim’s algorithm, a minimum spanning tree grows in successive stages: At any
stage in the algorithm, we can see that we have a set of vertices that have already been included in the
tree, the test of the vertices have not.
The prim’s algorithm then finds a new vertex to add it to the tree by choosing the edge
<Vi,Vj>, the smallest among all edges, where Vi is in the tree and Vj is yet to be included in the tree.
The algorithm starts by selecting vertex arbitrarily, and then in each stage, we add an edge (by
adding an associated vertex) to the tree.
Data Structures

5.5 The Algorithms of Kruskal and Prim


Prim’ s algorithm:-

//Input : A weighted connected graph G = (V,E)


//Output: ET, the set of edges composing a minimum spanning tree of G.
VT  {V0}

ET 

For i1 to |V| -1 do


Find a minimum-weight edge e* = (v*, u*) among all the edges (v,u), such that v is in V T and u is in
V-VT.

VT  VT {u*}

ET  ET {e*}

return ET

Example 2:

Kruskals algorithm:-

To obtain a minimum spanning tree of a graph, a novel approach was devised, by J. B.


Kruskal known as kruskals algorithm.
1. List all the edges of the graph G in the increasing order of weights.
2. Select the smallest edge from the list and add it into the spanning tree (initially it is empty) if
the inclusion of this edge does not make a cycle.
3. If the selected edge with smallest weight forms a cycle, remove it from the list.
4. Repeat steps 2-3 until the tree contains n-1 edges of list are empty.
Data Structures
5. If the tree T contains less than n-1 edges and the list is empty, no spanning tree is possible for
the graph, else return the minimum spanning tree T.

Consider the graph:

Edge weight Action

(V1,V4) 1 Accepted
(V6,V7) 1 Accepted
(V1,V2) 2 Accepted
(V3,V4) 2 Accepted
(V2,V4) 3 Rejected
(V1,V3) 4 Rejected
(V4,V7) 4 Accepted
(V3,V6) 5 Rejected
(V5,V7) 6 Accepted
Minimum spanning tree is:
Data Structures

5.6 Shortest paths in directed acyclic graphs

Shortest path problem:-

This problem of a graph is about finding a path between two vertices in such a way that this path
will satisfy some criteria of optimization.

E.g. For non-weighted graph, the number of edges will be minimum and for a weighted graph, the
sum of weights on all its edges in the path will be minimum.

 Dijkstra’s algorithm
 Warshalls algorithm.
 Floyd’s algorithm

5.7 Dijkstra’ s Algorithm:- (Single source shortest path problem)


Here, there is a distinct vertex, called the source vertex and it requires to find the shortest path
from this source vertex to all other vertices.

It is a single- source shortest path problem: for a given vertex called the source in a weighted
connected graph, find shortest path to all its other vertices. The best-known algorithm for the single –
source shortest path problem called Dijkstra’s algorithm.
Dijkstra’s algorithm finds shortest path to a graph’s vertices in order of their distance from a
given source. First, it finds the shortest path from the source to a vertex nearest to it, then to a second
nearest and so on. Dijkstra’s algorithm compares path length and therefore must add edge weights.
Data Structures

Routine for Dijkstra’s algorithm:-


Void Dijkstra’s (Table T)
{
Vertex v,w;
for(i i)
{
V= smallest unknown distance vertex;
if(v==not a vertex)
break;
T[V].Known = True;
for each w adjacent to V
if(!T[w].known)
if(t[v].Dist + vw<T[w].Dist)
{
/* Update w*/
Decrease ( T[w].Dist to T[v].Dist + (vw);
T[w].path = v;
}
}
}
Example 1:
Consider the graph:
Data Structures

Tree Remaining Illustration


vertices vertices

V1(- ,- ) V2 (V1, 2),


V3 ( - , - ),
V4 (V1, 1 ),

V5( - , ),

V6( - , ),

V7( - , ).

V4(V1 , 1 ) V2( V1 , 2 ),
V3 ( V4 ,2+1 ),
V5(V4 , 2+1 ) ,
V6( V4 ,8+1)
V7(V4 , 1+4)

V2(V1 ,2) V3 (V4 ,2+1 ),


V5(V4 , 2+1 ),
V6( V4 ,8+1)
V7(V4 , 1+4)
Data Structures

V3(V4 , 3) V5(V4, 2+1)


V6(V3, 1+2+5)
V7(V4, 1+4)

V5(V4 , 3) V6( V3, 1+2+5),


V7(V4, 1+4)

V7(V4,5) V6(V7, 1+4+1)


Data Structures
V6(V7,6)

//…

Operation on Graph:
Insertion:

a) To insert a vertex and hence establish connectivity with other vertices in the existing graph.
b) To insert an edge between two vertices in the graph.

Deletion:

a) To delete a vertex from the graph.


b) To delete an edge from the graph.

Merging:

To merge two graphs G1 and G2 into a single graph.

Traversal:

To visit all the vertices in the graph.


The implementation of the above operation depends on the way a graph is being represented.
We will consider only the linked representation and matrix representation of the graph.
Data Structures

Warshall’ s algorithm:

This is a classical algorithm by which we can determine whether these is a path from any

vertex Vi to another vertex Vj either directly or through one or more intermediate vertices. In other
words, we can test the reach ability of all pairs of vertices in a graph.
We can determine whether there is a path from any vertex vi to another vertex vj either
directly or through one or more intermediate vertices. First, we find the adjacency matrix from the
digraph. Then we compute path matrix.

Consider the unweighted graph: Find its reach ability of all pair of vertices.

Solution:
First find its adjacency matrix.

Algorithm:
Warshall (A[1…….N,1……N])
R(0) A
for k1 to n do
for i1 to n do
for j1 to n do

R(k)[i,j] R(k-1)[i,j] or R(k-1)[i,j] and R(k-1)[i,j]


return R(n)
Data Structures

Taking AND operation

R(0) =

R(1) =

R(2) =

R(3) =

R(4) =
Data Structures

R(5) =

R(6) =

Example 2:

Adjacency matrix =
Data Structures

R(0) =

R(1) =

R(2) =

R(3) =

R(4) =

5.8 Floyd’ s algorithm: - (All-pair shortest path problem)

 The path matrix obtained using warshall’s algorithm shows the presence or absence of any
path between a pair of vertices. It does not take into account, the weights of edges.
 If weights are to be taken into account and if we are interested in the length of the shortest
path between any pair of vertices, then Floyd’s algorithm is used.
 The goal is to find the distances from each vertex to all other vertices.
 A Distance matrix D is a n-by-n matrix indicating the length of the shortest path in a graph.
 The element dij in the i th row and the jth column of this matrix indicates the length of the
shortest path in a graph, from the ith vertex to the jth vertex.
Data Structures
 Distance matrix can be computed using an algorithm called Floyd’s algorithm, which is
applicable to both undirected and directed weighted graphs provided that they do not contain a
cycle of a negative length.
Condition:

dij(k) = min{ dij(k-1), dik(k-1) + dkj(k-1)} for k≥1, dij(0) = wij


dij(0) = 0 for i=j.
ALGORITHM: Floyd (w [1… .n, 1… .n])
// Implements Floyd’s algorithm for the all-pairs shortest path problems.
// Input: The weight matrix of the shortest path’s length.
//output: The distance matrix of the shortes path’s length.

DW
for K 1 to n do
for i 1 to n do
for j 1 to n do
D[i , j] min {D[ i,j ] , D[ i,k ], D[ k,j ]}
return D
Example:

2
a b

3 7
6

c d
1

Distance matrix D =
Data Structures

Perform ADD operation on row and column values to change the α  some distance values.
Moreover if possible change the values to small values.

D(0) =

D(1) =

D(2) =

D(3) =

In the below matrix the value 9 in row 3 is changed to 7 because there is a path which has the
distance 7
Data Structures

D(4) =

5.9 Graph Traversal:

Traversing a graph means visiting all the vertices in the graph exactly once.
Several methods are known to traverse a graph systematically, out of them two methods are accepted
as standard. They are:

 Depth first search (DFS)


 Breach first search (BFS)

 Depth- first search traversal is similar to the in order traversal of a binary tree.
 Starting from a given node, this traversal visits all the nodes up to the deepest level and so on.
 Another Standard graph traversal method is BFS. This traversal is very similar to the level –by-
level traversal of a tree. Here, any vertex in the level I will be visited only after the visit of all the
vertices in its preceding level, that is, at i-1. It is roughly analogous to the preorder traversal of the
tree.

Algorithm: DFS (V)

1. Push the starting vertex, V into the stack.


2. Repeat until the stack becomes empty:
a. Pop a vertex from the stack
b. Visit the popped vertex.
c. Push all the unvisited vertices adjacent to the popped vertex into the stack.
Data Structures

Consider the graph:

 Push the starting vertexV1 into the stack


 Pop a vertex from stack
 Visit v1

 Push all unvisited vertices adjacent to V1 into the stack.


 Pop a vertex, V2 from the stack
 Visit V2

 Push all unvisited vertices adjacent to V2 into all stacks.


Data Structures
 Pop a vertex, V6 from the stack.

 Visit V6
 Push all unvisited vertices adjacent to v6 into the stack.
 There are no unvisited vertices adjacent to V6
 Pop a vertex, V3 from the stack.

 Visit V3
 Push all unvisited vertices adjacent to V3 into the stack.
 Pop a vertex, V5 from the stack.

 Visit V5
 Push all unvisited vertices adjacent to V5 into the stack.
 There are no unvisited vertices adjacent to V5.
 Pop a vertex, V4 from the stack

 Visit V4
 Push all unvisited vertices adjacent to V4 into the stack.
 There are no unvisited vertices adjacent toV4.
 The stack is now empty.
 Therefore traversal is complete and visited vertices are:
V1, V2, V6, V3, V5, V4.

Algorithm:- BFS (V)


Data Structures
1. Visit the starting vertex, V and insert it into a queue.
2. Repeat step3 until the queue becomes empty.
3. Delete the front vertex from the queue, visit all its unvisited adjacent vertices, and insert them
into the queue.

Example:

Visit V1:-
 Insert V1 into the queue.

 Remove a vertex V1 from the queue.

 Visit all unvisited vertices adjacent to V1 and insert them in queue.

 Remove a vertex V2 from the queue.

 Visit all unvisited vertices adjacent to V2 and insert them in the queue.
Data Structures

 Remove a vertex V4 from the queue.

 Visit all unvisited vertices adjacent to V4 and insert them in the queue.

 Remove a vertex V3 from the queue.

 Visit all unvisited vertices adjacent to V3 and insert them in queue.


 But V3 does not have any unvisited adjacent vertices.
 Remove a vertex V6 from queue.

 Visit all unvisited vertices adjacent to V6 and insert them in the queue.V6 does not
have any unvisited adjacent vertices.
 Remove a vertex V5 from the queue.

 Visit all unvisited vertices adjacent to V5 and insert them in the queue.
 V5 does not have any unvisited adjacent vertices.

The queue is now empty. Therefore traversal is complete.


Visited vertices are:-
V1, V2, V4, V3, V6, V5.

Application of Depth-first search(DFS):


DFS is a generalization of preorder traversal. Starting at some vertex, V we process V and
then recursively traverse all vertices adjacent to V.
Template:
Void DFS(vertex V)
{
Visited[v]=true;
Data Structures
For each w adjacent to V
If (| visited[w])
DFS(w):
}
If we perform this process we need to be careful to avoid cycles. To do this, when we visit
each node mark it as “visited” and recursively call DFS an all adjacent vertices that are not already
marked.
This DFS algorithm continues the procedure until there is no unmarked node found.

Undirected graph:

An undirected graph is connected if and only if a DFS starting from any node visits every node.

5.10 Strongly connected components :

 A connected undirected graph is biconnected if there are no vertices whose removal


disconnects the rest of the graph.
 If the computer system is biconnected, user always have an alternate route for some terminal
is disrupted.
 If a graph is not biconnected, the vertices whose removal would disconnect the graph are
known as articulation points.

Introduction to NP- Completeness:

 Euler circuit problem, which finds a path that touches every edge exactly one, is solvable in
linear time.
 Hamilton cycle problem ask for a simple cycle that contain every vertex. No linear algorithm
for this problem.
 The single source unweighted problem for directed graph is also solvable in linear time. No
linear time algorithm for corresponding longest simple path problem.
 But there are no known algorithm that are guaranteed to run polynomial time
 Some important problems that are roughly equivalent in complexity.
 These problems form a class called NP – Complete problems.
 NP stands for non deterministic polynomial time.
NP completes problems:
Data Structures
 Among all the problems known to be in NP, there is a subset known as NP-complete problems
which contain hardest.
 NP-complete problem has some property that any problem in NP can be polynomially reduced
to it.

Examples:

 A problem P1 can be transformed to an instance of P2 as follows.


 Provide mapping so that instance of P1 can be transformed into instance of P2
 Solve P2 then map answer back to original.

Examples 2:
The number entered into packet calculation is decimal. These decimal no is converted to
binary to perform operation. Final result converted back to decimal for display.

*******
EXERCISES
PART – A
1. Define Graph.
2. Define adjacent nodes.
3. What is a directed graph?
4. What is a undirected graph?
5. What is a loop?
6. What is a simple graph?
7. What is a weighted graph?
8. Define out degree of a graph?
9. Define indegree of a graph?
10. Define path in a graph?
11. What is a simple path?
12. What is a cycle or a circuit?
13. What is an acyclic graph?
14. What is meant by strongly connected in a graph?
Data Structures
15. When is a graph said to be weakly connected?
16. Name the different ways of representing a graph?
17. What is an undirected acyclic graph?
18. What are the two traversal strategies used in traversing a graph?
19. What is a minimum spanning tree?
20. Name two algorithms two find minimum spanning tree
21. Define graph traversals.
22. List the two important key points of depth first search.
23. What do you mean by breadth first search (BFS)?
24. Differentiate BFS and DFS.
25. What do you mean by tree edge?
26. What do you mean by back edge?
27. Define biconnectivity.
28. What do you mean by articulation point?
29. What do you mean by shortest path?
30. What is a spanning Tree?
31. Does the minimum spanning tree of a graph give the shortest distance between any
2 specified nodes?
32. Explain Dijiksatra’s algorithm.
33. What is a minimum spanning tree?
34. What does Kruskals’s algorithm do?
35. Explain with neat steps to construct a forest.
PART – B
1. Explain shortest path algorithm with example.
2. Explain depth first and breadth first traversal?
3. Explain spanning and minimum spanning tree?
4. Explain kruskal’s and prim’s algorithm?
5. Explain topological sorting?
6. Give mode of operation in Dijkstra's algorithm.
7. Write a Program to find the minimum cost of a spanning tree.

Potrebbero piacerti anche