Sei sulla pagina 1di 53

UNIT-5

SEARCHING
Search is a process of finding a value in a list of values. In other words, searching is the process of
locating given value position in a list of values.
Linear Search (Sequential Search)
Linear search or sequential search is a method for finding a particular value in a list that consists of
checking every one of its elements, one at a time and in sequence, until the desired one is found.
Linear search is the simplest search algorithm; it is a special case of brute-force search. Its worst
case cost is proportional to the number of elements in the list; and so is its expected cost, if all list
elements are equally likely to be searched for. Therefore, if the list has more than a few elements,
other methods (such as binary search or hashing) will be faster, but they also impose
additionalrequirements.

How Linear Search works


Linear search in an array is usually programmed by stepping up an index variable until it reaches
the last index. This normally requires two comparisons for each list item: one to check whether the
index has reached the end of the array, and another one to check whether the item has the desired
value.
Linear Search Algorithm

1. Repeat For J = 1 toN


2. If (ITEM == A[J])Then
3. Print: ITEM found at location J
4. Return [End of If]
[End of ForLoop]
5. If (J > N)Then
6. Print: ITEM doesn’t exist
[End ofIf]
7. Exit
//CODE
int a[10],i,n,m,c=0, x;

printf("Enter the size of an array: ");


scanf("%d",&n);

printf("Enter the elements of the array: ");


for(i=0;i<=n-1;i++){
scanf("%d",&a[i]);
}

printf("Enter the number to be search: ");


scanf("%d",&m);
for(i=0;i<=n-1;i++){
if(a[i]==m){
x=I;
c=1;
break;
}
}
if(c==0)
printf("The number is not in the list");
else
printf("The number is found at location %d", x);

Complexity of linear Search


Linear search on a list of n elements. In the worst case, the search must visit every element once.
This happens when the value being searched for is either the last element in the list, or is not in
the list. However, on average, assuming the value searched for is in the list and each list element
is equally likely to be the value searched for, the search visits only n/2 elements. In best case the
array is already sorted i.e O(1)
Algorithm Worst Case Average Case Best Case

Linear Search O(n) O(n) O(1)

Binary Search
A binary search or half-interval search algorithm finds the position of a specified input value (the
search "key") within an array sorted by key value. For binary search, the array should be
arranged in ascending or descending order. In each step, the algorithm compares the search
key value with the key value of the middle element of the array. If the keys match, then a
matching element has been found and its index is returned. Otherwise, if the search key is less
than the middle element's key, then the algorithm repeats its action on the sub-array to the left of
the middle element or, if the search key is greater, on the sub-array to the right. If the remaining
array to be searched is empty, then the key cannot be found in the array and a special "not found"
indication isreturned.

How Binary Search Works

Searching a sorted collection is a common task. A dictionary is a sorted list of word definitions.
Given a word, one can find its definition. A telephone book is a sorted list of people's names,
addresses, and telephone numbers. Knowing someone's name allows one to quickly find their
telephone number and address.

Binary Search Algorithm

1. Set BEG = 1 and END =N

2. Set MID = (BEG + END) /2

3. Repeat step 4 to 8 While (BEG <= END) and (A[MID] ≠ITEM)


4. If (ITEM < A[MID])Then

5. Set END = MID –1

6. Else

7. Set BEG = MID +1


[End ofIf]
8. Set MID = (BEG + END) /2

9. If (A[MID] == ITEM)Then

10. Print: ITEM exists at location MID

11. Else

12. Print: ITEM doesn’t exist


[End ofIf]
13. Exit

//CODE

intar[10],val,mid,low,high,size,i;

clrscr();
printf("\nenter the no.s of elements u wanna input in array\n");
scanf("%d",&size);
for(i=0;i<size;i++)
{
printf("input the element no %d\n",i+1);
scanf("%d",&ar[i]);
}
printf("the arry inputed is \n");
for(i=0;i<size;i++)
{
printf("%d\t",ar[i]);
}
low=0;
high=size-1;
printf("\ninput the no. u wanna search \n");
scanf("%d",&val);
while(val!=ar[mid]&&high>=low)
{
mid=(low+high)/2;
if(ar[mid]==val)
{
printf("value found at %d position",mid+1);
}
if(val>ar[mid])
{
low=mid+1;
}
else
{
high=mid-1;
}}
Complexity of Binary Search

A binary search halves the number of items to check with each iteration, so locating an item (or
determining its absence) takes logarithmic time.

What does the time complexity O(log n) actually mean?


Complexities like O(1) and O(n) are simple and straightforward. O(1) means an operation which is
done to reach an element directly (like a dictionary or hash table), O(n) means first we would have to
search it by checking n elements, but what could O(log n) possibly mean?
Since binary search has a best case efficiency of O(1) and worst case (average case) efficiency of O(log
n), we will look at an example of the worst case. Consider a sorted array of 16 elements.
For the worst case, let us say we want to search for the the number 13.
A sorted array of 16 elements

Selecting the middle element as pivot (length / 2)

Since 13 is less than pivot, we remove the other half of the array

Repeating the process for finding the middle element for every sub-array

You can see that after every comparison with the middle term, our searching range gets divided into
half of the current range.
So, for reaching one element from a set of 16 elements, we had to divide the array 4 times,
We can say that,

Simplified Formula
Similarly, for n elements,

Generalization

Separating the power for the numerator and denominator


Multiplying both sides by 2^k

Final result
Now, let us look at the definition of logarithm, it says that
A quantity representing the power to which a fixed number (the base) must be raised to produce a given
number.
Which makes our equation into

Algorithm Worst Case Average Case Best Case

Binary Search O(logn) O(logn) O(1)

INTRODUCTION TO SORTING
Sorting is the process of arranging a list of elements in a particular order (Ascending or Descending).
The importance of sorting lies in the fact that data searching can be optimized to a very high level, if
data is stored in a sorted manner. Sorting is also used to represent data in more readable formats.
Following are some of the examples of sorting in real-life scenarios −
 Telephone Directory − The telephone directory stores the telephone numbers of people sorted by their
names, so that the names can be searched easily.
 Dictionary − The dictionary stores words in an alphabetical order so that searching of any word becomes
easy.

Sorting Efficiency

If someone asks you, how will you arrange a deck of shuffled cards in order, You would say, You will
start by checking every card, and making the deck as I move on. It can take hours to arrange the deck
in order, but that's how you will do it. But, computers don't work like this.
Since the beginning of the programming age, computer scientists have been working on solving the
problem of sorting by coming up with various different algorithms to sort data.
The two main criterias to judge which algorithm is better than the other have been:
1. Time taken to sort the given data.
2. Memory Space required to do so.
TYPES OF SORTING
Internal and External sorting
• An internal sort is any data sorting process that takes place entirely within the main
memory of a computer. This is possible whenever the data to be sorted is small enough to
all be held in the main memory. Eg. Bubble Sort, Insertion sort
• External sorting is a term for a class of sorting algorithms that can handle massive
amounts of data. External sorting is required when the data being sorted do not fit into
the main memory of a computing device (usually RAM) and instead they must reside in
the slower external memory (usually a hard drive). External sorting typically uses
a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in
main memory are read, sorted, and written out to a temporary file. In the merge phase, the
sorted sub files are combined into a single larger file. Eg. Merge Sort

In-place Sorting and Not-in-place Sorting


Sorting algorithms may require some extra space for comparison and temporary storage of few
data elements. These algorithms do not require any extra space and sorting is said to happen in-
place, or for example, within the array itself. This is called in-place sorting. Bubble sort is an
example of in-place sorting.

However, in some sorting algorithms, the program requires space which is more than or equal to
the elements being sorted. Sorting which uses equal or more space is called not-in-place sorting.
Merge-sort is an example of not-in-place sorting.

Stable and Unstable Sorting


• We can say a sorting algorithm is stable if two objects with equal keys appear in the same
order in sorted output as they appear in the input unsorted array.

Stable Sort
UnStable Sort

Stability of an algorithm matters when we wish to maintain the sequence of original


elements, like in a tuple for example.
Adaptive and Non-Adaptive Sorting Algorithm
A sorting algorithm is said to be adaptive, if it takes advantage of already 'sorted'
elements in the list that is to be sorted. That is, while sorting if the source list has some
element already sorted, adaptive algorithms will take this into account and will try not to
re-order them. (Presortedness of the input affects the running time). Eg. Quick Sort
A non-adaptive algorithm is one which does not take into account the elements which are
already sorted. They try to force every single element to be re-ordered to confirm their
sortedness.

Bubble Sort
Bubble Sort is an algorithm which is used to sort N elements that are given in a memory for eg:
an Array with N number of elements. Bubble Sort compares all the elements one by one and sort
them based on their values.
It is known as bubble sort, because with every complete iteration the largest element in the given
array, bubbles up towards the last place or the highest index, just like a water bubble rises up to
the water surface. Sorting takes place by stepping through all the data items one-by-one in pairs
and comparing adjacent data items and swapping each pair that is out of order.

How Bubble Sort Works

Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to greatest
number using bubble sort. In each step, elements written in bold are being compared. Three
passes will be required.

First Pass:
( 5 1 4 2 8 ) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and swaps since 5
> 1.
( 1 5 4 2 8 ) ( 1 4 5 2 8 ), Swap since 5 >4
( 1 4 5 2 8 ) ( 1 4 2 5 8 ), Swap since 5 >2
( 1 4 2 58 ) ( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),algorithm
does not swap them.

(14258) (14258)
( 1 4 2 5 8 ) ( 1 2 4 5 8 ), Swap since 4 > 2
(12458) (12458)
(12458) (12458)
Now, the array is already sorted, but our algorithm does not know if it is completed. The
algorithm needs one whole pass without any swap to know it is sorted.

Third Pass:
(12458) (12458)
(12458) (12458)
(12458) (12458)
(12458) (12458)

Bubble Sort Algorithm

1. Repeat Step 2 and 3 for k=1 ton

2. Setptr=1

3. Repeat whileptr<n-k

a) If (A[ptr] > A[ptr+1]) Then

Interchange A[ptr] andA[ptr+1]

[End ofIf]

b) ptr=ptr+1

[end of step 3 loop]

[end of step 1 loop]

4. Exit

//CODE

Let's consider an array with values {5, 1, 6, 2, 4, 3}


int a[6] = {5, 1, 6, 2, 4, 3};
int i, j, temp;
for(i=0; i<6, i++)
{
for(j=0; j<6-i-1; j++)
{
if( a[j] >a[j+1])
{
temp = a[j];
a[j] = a[j+1];
a[j+1] =temp;
}
}
}
Above is the algorithm, to sort an array using Bubble Sort. Although the above logic will sort
and unsorted array, still the above algorithm isn't efficient and can be enhanced further. Because
as per the above logic, the for loop will keep going for six iterations even if the array gets sorted
after the second iteration.
Hence we can insert a flag and can keep checking whether swapping of elements is taking place
or not. If no swapping is taking place that means the array is sorted and we can jump out of the
for loop. So, improved algorithm for bubble sort is:

1. Repeat Step 2 and 3 for k=1 ton

2. Set ptr=1 and flag=0

3. Repeat whileptr<n-k

a. If (A[ptr] > A[ptr+1])

Then Interchange A[ptr]

andA[ptr+1]

flag=1

[End ofIf]

b. ptr=ptr+1

[end of step 3 loop]

4. if(flag==0) then

break

[end of step 1 loop]

5. Exit

int a[6] = {5, 1, 6, 2, 4, 3};


int i, j, temp;
for(i=0; i<6, i++)
{
for(j=0; j<6-i-1; j++)
{
int flag=0; //taking a flag variable
if( a[j] >a[j+1])
{
temp = a[j];
a[j] = a[j+1];
a[j+1] =temp;
flag=1; //setting flag as 1, if swappingoccurs
}
}
if(!flag) //breaking out of for loop if no swapping takesplace
{
break;
}
}

In the above code, if in a complete single cycle of j iteration(inner for loop), no swapping takes
place, and flag remains 0, then we will break out of the for loops, because the array has already
been sorted.

Complexity of Bubble Sort Algorithm

In Bubble Sort, n-1 comparisons will be done in 1st pass, n-2 in 2nd pass, n-3 in 3rd pass and so
on. So the total number of comparisons will be

F(n)=(n-1)+(n-2)+…………………………+2+1=n(n-1)/2 = O(n2)

Algorithm Worst Case Average Case Best Case Space Comlexity

Bubble Sort n(n-1)/2 = O(n2) n(n-1)/2 = O(n2) O(n) O(1)

Selection Sort
Selection sorting is conceptually the simplest sorting algorithm. This algorithm first finds the
smallest element in the array and exchanges it with the element in the first position, then find the
second smallest element and exchange it with the element in the second position, and continues
in this way until the entire array is sorted

How Selection Sort works


In the first pass, the smallest element found is 1, so it is placed at the first position, then leaving
first element, smallest element is searched from the rest of the elements, 3 is the smallest, so it is
then placed at the second position. Then we leave 1 and 3, from the rest of the elements, we
search for the smallest and put it at third position and keep doing this, until array is sorted
Selection Sort Algorithm

1. Repeat For J = 0 toN-1


2. Set MIN =J
3. Repeat For K = J+1 toN
4. If (A[K] < A[MIN])Then
5. Set MIN = K
[End ofIf]
[End of Step 3 For Loop]
6. Interchange A[J] andA[MIN]
[End of Step 1 ForLoop]
7. Exit

//CODE

void selectionSort(int a[], int size)


{
int i, j, min, temp;
for(i=0; i < size-1; i++ )
{
min = i; //setting min as i
for(j=i+1; j < size; j++)
{
if(a[j] < a[min]) //if element at j is less than element at min position
{
min =j; //then set min asj
}
}
temp = a[i];
a[i] = a[min];
a[min] = temp;
}
}
Complexity of Selection Sort Algorithm

The number of comparison in the selection sort algorithm is independent of the original order of
the element. That is there are n-1 comparison during PASS 1 to find the smallest element, there
are n-2 comparisons during PASS 2 to find the second smallest element, and so on. Accordingly
F(n)=(n-1)+(n-2)+…………………………+2+1=n(n-1)/2 = O(n2)

Algorithm Worst Case Average Case Best Case Space Comlexity

Selection Sort n(n-1)/2 = O(n2) n(n-1)/2 = O(n2) O(n2) O(1)

Insertion Sort
Consider you have 10 cards out of a deck of cards in your hand. And they are sorted, or arranged
in the ascending order of their numbers.
If I give you another card, and ask you to insert the card in just the right position, so that the
cards in your hand are still sorted. What will you do?
Well, you will have to go through each card from the starting or the back and find the right
position for the new card, comparing it's value with each card. Once you find the right position,
you will insert the card there.
Similarly, if more new cards are provided to you, you can easily repeat the same process and
insert the new cards and keep the cards sorted too.
This is exactly how insertion sort works. It starts from the index 1(not 0), and each index
starting from index 1 is like a new card, that you have to place at the right position in the sorted
subarray on the left.

It is a simple sorting algorithm that builds the final sorted array (or list) one item at a time. This
algorithm is less efficient on large lists than more advanced algorithms such
as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

 Simple implementation
 Efficient for small datasets
 Stable; i.e., does not change the relative order of elements with equal keys
 In-place; i.e., only requires a constant amount O(1) of additional memory space.

How Insertion Sort Works


Insertion Sort Algorithm

This algorithm sorts the array A with N elements.

1. Set A[0]=-12345(infinity i.e. Any largeno)

2. Repeat step 3 to 5 for k=2 to n

3. Set key=A[k] And j=k-1

4. Repeat while key<A[j]

A) SetA[j+1]=A[j]

b) j=j-1

5. Set A[j+1]=key

6. Return

//CODE
int A[6] = {5, 1, 6, 2, 4, 3};
int i, j, key;
for(i=1; i<6; i++)
{
key = A[i];
j = i-1;
while(j>=0 && key < A[j])
{
A[j+1] = A[j];
j--;
}
A[j+1] = key;
}
Complexity of Insertion Sort

The number f(n) of comparisons in the insertion sort algorithm can be easily computed. First of
all, the worst case occurs when the array A is in reverse order and the inner loop must use the
maximum number K-1 of comparisons. Hence

F(n)= 1+2+3+……………………………….+(n-1)=n(n-1)/2= O(n2)


Furthermore, One can show that, on the average, there will be approximately (K-1)/2
comparisons in the inner loop. Accordingly, for the average case.
F(n)=O(n2)
Thus the insertion sort algorithm is a very slow algorithm when n is very large.

Algorithm Worst Case Average Case Best Case Space Comlexity

Insertion Sort n(n-1)/2 = O(n2) n(n-1)/4 = O(n2) O(n) O(1)

Merge Sort
Merge Sort follows the rule of Divide and Conquer. But it doesn't divide the list into two halves.
In merge sort the unsorted list is divided into N sub lists, each having one element, because a list
of one element is considered sorted. Then, it repeatedly merge these sub lists, to produce new
sorted sub lists, and at lasts one sorted list is produced.
The concept of Divide and Conquer involves three steps:
1. Divide the problem into multiple small problems.
2. Conquer the subproblems by solving them. The idea is to break down the problem into atomic
subproblems, where they are actually solved.
3. Combine the solutions of the subproblems to find the solution of the actual problem.

Merge Sort is quite fast, and has a time complexity of O(n log n). It is also a stable sort, which
means the equal elements are ordered in the same order in the sorted list.
How Merge Sort Works

Suppose the array A contains 8 elements, each pass of the merge-sort algorithm will start at the
beginning of the array A and merge pairs of sorted subarrays as follows.
PASS 1. Merge each pair of elements to obtain the list of sorted pairs.

PASS 2. Merge each pair of pairs to obtain the list of sorted quadruplets.

PASS 3. Merge each pair of sorted quadruplets to obtain the two sorted subarrays.

PASS 4. Merge the two sorted subarrays to obtain the single sorted array.

Merge Sort Algorithm

/* Sorting using Merge Sort Algorithm


a[] is the array, p is starting index, that is0,
and r is the last index of array.*/
Lets take a[5] = {32, 45, 67, 2, 7} as the array to besorted.
void mergesort(int a[], int p, intr)
{
int q;
if(p < r)
{
q = floor( (p+r) / 2);
mergesort(a, p, q);
mergesort(a, q+1, r);
merge(a, p, q, r);
}
}

void merge (int a[], int p, int q, int r)


{
intb[5]; //same size ofa[]
int i, j, k;
k = 0;
i = p;
j = q+1;
while(i <= q && j <= r)
{
if(a[i] < a[j])
{
b[k++] =a[i++]; // same as b[k]=a[i]; k++;i++;
}
else
{
b[k++] = a[j++];
}
}

while(i <= q)
{
b[k++] = a[i++];
}

while(j <= r)
{
b[k++] = a[j++];
}

for(i=r; i >= p; i--)


{
a[i] =b[--k]; // copying back the sorted list toa[]
}
}
Complexity of Merge Sort Algorithm
Merge Sort is quite fast, and has a time complexity of O(n*log n). It is also a stable sort, which means the
"equal" elements are ordered in the same order in the sorted list.
In this section we will understand why the running time for merge sort is O(n*log n).
As we have already learned in Binary Search that whenever we divide a number into half in every stpe, it
can be represented using a logarithmic function, which is log n and the number of steps can be
represented by log n + 1(at most)
Also, we perform a single step operation to find out the middle of any subarray, i.e. O(1).
And to merge the subarrays, made by dividing the original array of n elements, a running time
of O(n) will be required.
Hence the total time for mergeSort function will become n(log n + 1), which gives us a time complexity
of O(n*log n).
 Time complexity of Merge Sort is O(n*Log n) in all the 3 cases (worst, average and best) as merge sort
always divides the array in two halves and takes linear time to merge two halves.
 It requires equal amount of additional space as the unsorted array. Hence its not at all recommended for
searching large unsorted arrays.

Algorithm Worst Case Average Best Case Space Complexity


Case
Merge Sort O(n logn) O(n logn) O(n logn) O(n)

QuickSort
Quick Sort is also based on the concept of Divide and Conquer, just like merge sort. But in quick sort
all the heavy lifting(major work) is done while dividing the array into subarrays, while in case of merge
sort, all the real work happens during merging the subarrays. In case of quick sort, the combine step
does absolutely nothing.
It is also called partition-exchange sort. This algorithm divides the list into three main parts:
1. Elements less than the Pivot element
2. Pivot element(Central element)
3. Elements greater than the pivot element
Pivot element can be any element from the array, it can be the first element, the last element or
any random element. In this tutorial, we will take the leftmost element or the first element
as pivot.
Figure shows that 54 will serve as our first pivot value. The partition process will happen next. It
will find the split point and at the same time move other items to the appropriate side of the list,
either less than or greater than the pivot value.

Partitioning begins by locating two position markers—let’s call them leftmark and rightmark—at
the beginning and end of the remaining items in the list (positions 1 and 8 in Figure). The goal of
the partition process is to move items that are on the wrong side with respect to the pivot value
while also converging on the split point. Figure below shows this process as we locate the position
of 54.
We begin by incrementing leftmark until we locate a value that is greater than the pivot value. We
then decrement rightmark until we find a value that is less than the pivot value. At this point we
have discovered two items that are out of place with respect to the eventual split point. For our
example, this occurs at 93 and 20. Now we can exchange these two items and then repeat the
process again.

At the point where rightmark becomes less than leftmark, we stop. The position of rightmark is
now the split point. The pivot value can be exchanged with the contents of the split point and the
pivot value is now in place (Figure below). In addition, all the items to the left of the split point
are less than the pivot value, and all the items to the right of the split point are greater than the
pivot value. The list can now be divided at the split point and the quick sort can be invoked
recursively on the two halves.
Quick Sort Algorithm

Algo QUICKSORT (A, p, r)


{
if p <r
{
then q ← PARTITION (A, p, r)
QUICKSORT (A, p, q -1)
QUICKSORT (A, q + 1,r)
}
}
The key to the algorithm is the PARTITION procedure, which rearranges the subarray A[p r] in
place.

Algo PARTITION (A, p, r)


{
pivot=A[p]
i=p, j=r
while(i<j)
{
while(A[i]<=pivot && i<j)
{
i=i+1
}
while(A[j]>pivot)
{
end=end-1
}
if(i<j)
{
swap(A[i],A[j])
}
}
A[p]=A[j]
A[j]=pivot
Return j
}

//CODE
/* Sorting using Quick Sort Algorithm
a[] is the array, p is starting index, that is 0,
and r is the last index of array.*/
void quicksort(int a[], int p, int r)
{
if(p < r)
{
int q;
q = partition(a, p, r);
quicksort(a, p, q-1);
quicksort(a, q+1, r);
}
}

int partition (int a[], int p, int r)


{
int i, j, pivot, temp;
pivot = a[p];
i = p;
j = r;
while(1)
{
while(a[i] < pivot && a[i] != pivot)
i++;
while(a[j] > pivot && a[j] != pivot)
j--;
if(i < j)
{
temp = a[i];
a[i] = a[j];
a[j] = temp;
}
else
{
Return j;
}}}
Complexity of Quick Sort Algorithm
Worst Case: The worst case occurs when the partition process always picks greatest or smallest element
as pivot. If we consider above partition strategy where last element is always picked as pivot, the worst
case would occur when the array is already sorted in increasing or decreasing order.
F(n)= n+(n-1)+(n-2)+…………………………+2+1=n(n+1)/2 = O(n2)

Best Case: The best case occurs when the partition process always picks the middle element as pivot.
Following is recurrence for best case.
T(n) = 2T(n/2) + (n)
The solution of above recurrence is (nLogn)

Algorithm Worst Case Average Case Best Case

Quick Sort n(n+1)/2 = O(n2) O(n logn) O(n logn)

Is QuickSort stable?
The default implementation is not stable.
Why Quick Sort is preferred over MergeSort for sorting Arrays?
Quick Sort in its general form is an in-place sort (i.e. it doesn’t require any extra storage) whereas
merge sort requires O(N) extra storage, N denoting the array size which may be quite expensive.
Allocating and de-allocating the extra space used for merge sort increases the running time of the
algorithm. Comparing average complexity we find that both type of sorts have O(NlogN) average
complexity but the constants differ. For arrays, merge sort loses due to the use of extra O(N) storage
space.

Heap Sort
Heap sort is a comparison based sorting technique based on Binary Heap data structure. So, we will
first see what a heap tree is and how basic operations such as deletion and insertion are performed on it?
A Binary Heap is a Complete Binary Tree where items are stored in a special order such that value in a
parent node is greater(or smaller) than the values in its two children nodes. The former is called as max
heap and the latter is called min heap. The heap can be represented by binary tree or array.

Every heap data structure has the following properties...


Property #1 (Ordering): Nodes must be arranged in a order according to values based on Max heap or
Min heap.
Property #2 (Structural): All levels in a heap must full, except last level and nodes must be filled from
left to right strictly.
Max Heap
Max heap data structure is a specialized full binary tree data structure except last leaf node can be alone.
In a max heap nodes are arranged based on node value.

Max heap is defined as follows...


Max heap is a specialized full binary tree in which every parent node contains greater or equal value
than its child nodes. And last leaf node can be alone.
Example

Above tree is satisfying both Ordering property and Structural property according to the Max Heap data
structure.
Operations on Max Heap
The following operations are performed on a Max heap data structure...
1. Finding Maximum
2. Insertion
3. Deletion
Finding Maximum Value Operation in Max Heap
Finding the node which has maximum value in a max heap is very simple. In max heap, the root node
has the maximum value than all other nodes in the max heap. So, directly we can display root node
value as maximum value in max heap.
Insertion Operation in Max Heap
Insertion Operation in max heap is performed as follows...
 Step 1: Insert the newNode as last leaf from left to right.
 Step 2: Compare newNode value with its Parent node.
 Step 3: If newNode value is greater than its parent, then swap both of them.
 Step 4: Repeat step 2 and step 3 until newNode value is less than its parent nede (or) newNode
reached to root.
Example
Consider the above max heap. Insert a new node with value 85.
 Step 1: Insert the newNode with value 85 as last leaf from left to right. That means newNode is added as
a right child of node with value 75. After adding max heap is as follows...

 Step 2: Compare newNode value (85) with its Parent node value (75). That means 85 > 75
 Step 3: Here newNode value (85) is greater than its parent value (75), then swap both of them. After
wsapping, max heap is as follows...

-
 Step 4: Now, again compare newNode value (85) with its parent nede value (89).

Here, newNode value (85) is smaller than its parent node value (89). So, we stop insertion process.
Finally, max heap after insetion of a new node with value 85 is as follows...
Deletion Operation in Max Heap
In a max heap, deleting last node is very simple as it is not disturbing max heap properties.

Deleting root node from a max heap is title difficult as it disturbing the max heap properties. We
use the following steps to delete root node from a max heap...
 Step 1: Swap the root node with last node in max heap
 Step 2: Delete last node.
 Step 3: Now, compare root value with its left child value.
 Step 4: If root value is smaller than its left child, then compare left child with its right sibling. Else
goto Step 6
 Step 5: If left child value is larger than its right sibling, then swap root with left child.
Otherwise swap root with its right child.
 Step 6: If root value is larger than its left child, then compare root value with its right child value.
 Step 7: If root value is smaller than its right child, then swap root with rigth child. otherwise stop
the process.
 Step 8: Repeat the same until root node is fixed at its exact position.
Example
Consider the above max heap. Delete root node (90) from the max heap.
 Step 1: Swap the root node (90) with last node 75 in max heap After swapping max heap is as follows...

 Step 2: Delete last node. Here node with value 90. After deleting node with value 90 from heap, max
heap is as follows...
 Step 3: Compare root node (75) with its left child (89).

Here, root value (75) is smaller than its left child value (89). So, compare left child (89) with its right
sibling (70).

 Step 4: Here, left child value (89) is larger than its right sibling (70), So, swap root (75) with left child
(89).

 Step 5: Now, again compare 75 with its left child (36).


Here, node with value 75 is larger than its left child. So, we compare node with value 75 is compared
with its right child 85.

 Step 6: Here, node with value 75 is smaller than its right child (85). So, we swap both of them. After
swapping max heap is as follows...

 Step 7: Now, compare node with value 75 with its left child (15).

Here, node with value 75 is larger than its left child (15) and it does not have right child. So we stop the
process.
Finally, max heap after deleting root node (90) is as follows...

Heap Sort is one of the best sorting methods being in-place and with no quadratic worst-case
scenarios. Heap sort algorithm is divided into two basic parts:
1. Creating a Heap of the unsorted list.
2. Then a sorted array is created by repeatedly removing the largest/smallest element
from the heap, and inserting it into the array. The heap is reconstructed after each
removal.

How Heap Sort Works

Initially on receiving an unsorted list, the first step in heap sort is to create a Heap data structure
(Max-Heap or Min-Heap). Once heap is built, the first element of the Heap is either largest or
smallest (depending upon Max-Heap or Min-Heap), so we put the first element of the heap in our
array. Then we again make heap using the remaining elements, to again pick the first element of
the heap and put it into the array. We keep on doing the same repeatedly until we have the
complete sorted list in our array.
An Example of Heapsort:
Given an array of 6 elements: 15, 19, 10, 7, 17, 16, sort it in ascending order using heap sort.
Steps:
1. Consider the values of the elements as priorities and build the heap tree.
2. Start deleteMin operations, storing each deleted element at the end of the heap array.
After performing step 2, the order of the elements will be opposite to the order in the heap tree.
Hence, if we want the elements to be sorted in ascending order, we need to build the heap tree
in descending order - the greatest element will have the highest priority.
Note that we use only one array , treating its parts differently:
a. when building the heap tree, part of the array will be considered as the heap,
and the rest part - the original array.
b. when sorting, part of the array will be the heap, and the rest part - the sorted array.
This will be indicated by colors: white for the original array, blue for the heap and red for the
sorted array
Here is the array: 15, 19, 10, 7, 17, 6
A. Building the heap tree
The array represented as a tree, complete but not ordered:
Start with the rightmost node at height 1 - the node at position 3 = Size/2.
It has one greater child and has to be percolated down:

After processing array[3] the situation is:

Next comes array[2]. Its children are smaller, so no percolation is needed.

The last node to be processed is array[1]. Its left child is the greater of the children.
The item at array[1] has to be percolated down to the left, swapped with array[2].

As a result the situation is:

The children of array[2] are greater, and item 15 has to be moved down further, swapped with
array[5].

Now the tree is ordered, and the binary heap is built.


B. Sorting - performing deleteMax operations:
1. Delete the top element 19.
1.1. Store 19 in a temporary place. A hole is created at the top
1.2. Swap 19 with the last element of the heap.
As 10 will be adjusted in the heap, its cell will no longer be a part of the heap.
Instead it becomes a cell from the sorted array

1.3. Percolate down the hole

1.4. Percolate once more (10 is less that 15, so it cannot be inserted in the previous hole)
Now 10 can be inserted in the hole

2. DeleteMax the top element 17


2.1. Store 17 in a temporary place. A hole is created at the top

2.2. Swap 17 with the last element of the heap.


As 10 will be adjusted in the heap, its cell will no longer be a part of the heap.
Instead it becomes a cell from the sorted array
2.3. The element 10 is less than the children of the hole, and we percolate the hole down:

2.4. Insert 10 in the hole

3. DeleteMax 16
3.1. Store 16 in a temporary place. A hole is created at the top
3.2. Swap 16 with the last element of the heap.
As 7 will be adjusted in the heap, its cell will no longer be a part of the heap.
Instead it becomes a cell from the sorted array

3.3. Percolate the hole down (7 cannot be inserted there - it is less than the children of the hole)

3.4. Insert 7 in the hole

4. DeleteMax the top element 15


4.1. Store 15 in a temporary location. A hole is created.
4.2. Swap 15 with the last element of the heap.
As 10 will be adjusted in the heap, its cell will no longer be a part of the heap.
Instead it becomes a position from the sorted array

4.3. Store 10 in the hole (10 is greater than the children of the hole)

5. DeleteMax the top element 10.


5.1. Remove 10 from the heap and store it into a temporary location.

5.2. Swap 10 with the last element of the heap.


As 7 will be adjusted in the heap, its cell will no longer be a part of the heap. Instead it becomes
a cell from the sorted array
5.3. Store 7 in the hole (as the only remaining element in the heap

7 is the last element from the heap, so now the array is sorted

Heap Sort Algorithm

• HEAPSORT(A)
1. BUILD-MAX-HEAP(A)
2. for i ← length[A] down to 2
3. do exchange A[1] ↔ A[i]
4. heap-size[A] ← heap-size[A] –1
5. MAX-HEAPIFY(A,1)

• BUILD-MAX-HEAP(A)
1. heap-size[A] ← length[A]
2. for i ← length[A]/2 down to 1
3. do MAX-HEAPIFY(A, i)

• MAX-HEAPIFY(A, i)
1. l ← LEFT(i)
2. r ← RIGHT(i)
3. if l ≤ heap-size[A] and A[l] > A[i]
4. then largest←l
5. else largest←i
6. if r ≤ heap-size[A] and A[r] >A[largest]
7. then largest←r
8. if largest =i
9. then exchange A[i ] ↔A[largest]
10. MAX-HEAPIFY(A,largest)

//CODE

In the below algorithm, initially heapsort() function is called, which calls buildmaxheap() to
build heap, which inturn uses maxheap() to build the heap.

void heapsort(int[], int);


void buildmaxheap(int [], int);
void maxheap(int [], int, int);

void main()
{
int a[10], i, size;
printf("Enter sizeoflist"); // less than 10, because max size of array is 10
scanf(“%d”,&size);
printf( "Enter" elements");
for( i=0; i < size; i++)
{
Scanf(“%d”,&a[i]);
}
heapsort(a, size);
getch();
}

void heapsort (int a[], int length)


{
buildmaxheap(a, length);
int heapsize, i, temp;
heapsize = length - 1;
for( i=heapsize; i >= 0; i--)
{
temp = a[0];
a[0] = a[heapsize];
a[heapsize] = temp;
heapsize--;
maxheap(a, 0, heapsize);
}
for( i=0; i < length; i++)
{
Printf("\t%d" ,a[i]);
}
}

void buildmaxheap (int a[], int length)


{
int i, heapsize;
heapsize = length -
1;
for( i=(length/2); i >= 0; i--)
{
maxheap(a, i, heapsize);
}
}

void maxheap(int a[], int i, int heapsize)


{
int l, r, largest, temp;
l = 2*i;
r = 2*i + 1;
if(l <= heapsize && a[l] > a[i])
{
largest = l;
}
else
{
largest = i;
}
if( r <= heapsize && a[r] > a[largest])
{
largest = r;
}
if(largest != i)
{
temp = a[i];
a[i] = a[largest];
a[largest] = temp;
maxheap(a, largest, heapsize);
}
}

Complexity of Heap Sort Algorithm

The heap sort algorithm is applied to an array A with n elements. The algorithm has two phases,
and we analyze the complexity of each phase separately.
Phase 1. Suppose H is a heap. The number of comparisons to find the appropriate place of a new
element item in H cannot exceed the depth of H. Since H is complete tree, its depth is bounded
by log2m where m is the number of elements in H. Accordingly, the total number g(n) of
comparisons to insert the n elements of A into H is boundedas
g(n) ≤ n log2n
Phase 2. If H is a complete tree with m elements, the left and right subtrees of H are heaps and L
is the root of H Reheaping uses 4 comparisons to move the node L one step down the tree H.
Since the depth cannot exceeds log2m , it uses 4log2m comparisons to find the appropriate place
of L in the tree H.
h(n)≤4nlog2n
sThus each phase requires time proportional to nlog2n, the running time to sort n elements array
A would be nlog2n
Algorithm Worst Case Average Case Best Case

Heap Sort O(n logn) O(n logn) O(n logn)

Radix Sort
Radix sort is an integer sorting algorithm that sorts data with integer keys by grouping the keys by
individual digits that share the same significant position and value (place value). Radix sort
uses counting sort as a subroutine to sort an array of numbers. Because integers can be used to
represent strings (by hashing the strings to integers), radix sort works on data types other than just
integers. Because radix sort is not comparison based, it is not bounded by for running time — in fact,
radix sort can perform in linear time.
Radix sort incorporates the counting sort algorithm so that it can sort larger, multi-digit numbers
without having to potentially decrease the efficiency by increasing the range of keys the algorithm must
sort over (since this might cause a lot of wasted time).

Counting Sort
Counting sort can only sort one place value of a given base. For example, a counting sort for base-10
numbers can only sort digits zero through nine. To sort two-digit numbers, counting sort would need to
operate in base-100. Radix sort is more powerful because it can sort multi-digit numbers without having
to search over a wider range of keys (which would happen if the base was larger).
Where it(counting-sort) Fails?
when elements reaches in the range from 1 to n2 as in that case it will take O(n2) which is worst than the
above mentioned sorting algorithms.
can we do better than O(nLogn) for the range 1 to n 2?
The answer is Radix sort.
The Idea of Radix sort
The idea is to sort digit by digit starting from the least significant digit and moving to the most
significant digit. here counting-sort is used as a subroutine to sort.
The Radix sort Algorithm
1. For all i where i is from the least significant to the most significant digit of the number do the
following
o sort the input array using counting sort according to its i'th digit.

Example

Original, unsorted list:


170, 45, 75, 90, 802, 24, 2, 66

Sorting by least significant digit (1s place) gives: [Notice that we keep 802 before 2, because 802
occurred before 2 in the original list, and similarly for pairs 170 & 90 and 45 & 75.]

170, 90, 802, 2, 24, 45, 75, 66

Sorting by next digit (10s place) gives: [Notice that 802 again comes before 2 as 802 comes before 2 in
the previous list.]
802, 2, 24, 45, 66, 170, 75, 90

Sorting by most significant digit (100s place) gives: 2, 24, 45, 66, 75, 90, 170, 802

Example: Assume the input array is:


10,21,17,34,44,11,654,123
Based on the algorithm, we will sort the input array according to the one's digit (least significant
digit).
0: 10
1: 21 11
2:
3: 123
4: 34 44 654
5:
6:
7: 17
8:
9:
So, the array becomes 10,21,11,123,24,44,654,17
Now, we'll sort according to the ten's digit:
0:
1: 10 11 17
2: 21 123
3: 34
4: 44
5: 654
6:
7:
8:
9:

Now, the array becomes : 10,11,17,21,123,34,44,654


Finally , we sort according to the hundred's digit (most significant digit):
0: 010 011 017 021 034 044
1: 123
2:
3:
4:
5:
6: 654
7:
8:
9:

The array becomes : 10,11,17,21,34,44,123,654 which is sorted. This is how our algorithm
works.

Implementation:

void countsort(int arr[],int n,int place)


{
int i,freq[range]={0}; //range for integers is 10 as digits range from 0-9
int output[n];
for(i=0;i<n;i++)
freq[(arr[i]/place)%range]++;
for(i=1;i<range;i++)
freq[i]+=freq[i-1];
for(i=n-1;i>=0;i--)
{
output[freq[(arr[i]/place)%range]-1]=arr[i];
freq[(arr[i]/place)%range]--;
}
for(i=0;i<n;i++)
arr[i]=output[i];
}
void radixsort(ll arr[],int n,int maxx) //maxx is the maximum element in the array
{
int mul=1;
while(maxx)
{
countsort(arr,n,mul);
mul*=10;
maxx/=10;
}
}

What is the running time of Radix Sort?


Let there be d digits in input integers. Radix Sort takes O(d*(n+b)) time where b is the base for
representing numbers, for example, for decimal system, b is 10. What is the value of d? If k is the
maximum possible value, then d would be O(logb(k)). So overall time complexity is O((n+b) * logb(k)).
Which looks more than the time complexity of comparison based sorting algorithms for a large k. Let
us first limit k. Let k <= nc where c is a constant. In that case, the complexity becomes O(nLogb(n)). But
it still doesn’t beat comparison based sorting algorithms.
What if we make value of b larger?. What should be the value of b to make the time complexity linear?
If we set b as n, we get the time complexity as O(n). In other words, we can sort an array of integers
with range from 1 to nc if the numbers are represented in base n (or every digit takes log2(n) bits).
Is Radix Sort preferable to Comparison based sorting algorithms like Quick-Sort?
If we have log2n bits for every digit, the running time of Radix appears to be better than Quick Sort for
a wide range of input numbers. The constant factors hidden in asymptotic notation are higher for Radix
Sort and Quick-Sort uses hardware caches more effectively. Also, Radix sort uses counting sort as a
subroutine and counting sort takes extra space to sort numbers.

Algorithm Worst Case Average Case Best Case

Radix Sort O(n2) d*s*n O(n logn)

5.4 PRACTICAL CONSIDERATION FOR INTERNAL SORTING


Apart from radix sort, all the sorting methods require excessive data movement; i.e., as the result
of a comparison, records may be physically moved. This tends to slow down the sorting process
when records are large. In sorting files in which the records are large it is necessary to modify the
sorting methods so as to minimize data movement. Methods such as Insertion Sort and Merge
Sort can be easily modified to work with a linked file rather than a sequential file. In this case
each record will require an additional link field. Instead of physically moving the record, its link
field will be changed to reflect the change in position of that record in the file. At the end of the
sorting process, the records are linked together in the required order. In many applications (e.g.,
when we just want to sort files and then output them record by record on some external media in
the sorted order) this is sufficient. However, in some applications it is necessary to physically
rearrange the records in place so that they are in the required order. Even in such cases
considerable savings can be achieved by first performing a linked list sort and then physically
rearranging the records according to the order specified in the list. This rearranging can be
accomplished in linear time using some additional space.

If the file, F, has been sorted so that at the end of the sort P is a pointer to the first record in a
linked list of records then each record in this list will have a key which is greater than or equal to
the key of the previous record (if there is a previous record). To physically rearrange these
records into the order specified by the list, we begin by interchanging records R 1 and RP. Now,
the record in the position R1 has the smallest key. If P≠1 then there is some record in the list with
link field = 1. If we could change this link field to indicate the new position of the record
previously at position 1 then we would be left with records R2, ...,Rn linked together in non
decreasing order. Repeating the above process will, after n - 1 iterations, result in the desired
rearrangement.

SEARCHTREES
Binary Search Tree,AVL Tree,m-way search tree, B-tree, B+tree (Already Discussed)

HASHING

Suppose we want to design a system for storing employee records keyed using phone numbers. And we
want following queries to be performed efficiently:
1. Insert a phone number and corresponding information.
2. Search a phone number and fetch the information.
3. Delete a phone number and related information.

We can think of using the following data structures to maintain information about different phone
numbers.
1. Array of phone numbers and records.
2. Linked List of phone numbers and records.
3. Balanced binary search tree with phone numbers as keys.
4. Direct Access Table.

For arrays and linked lists, we need to search in a linear fashion, which can be costly in practice. If we
use arrays and keep the data sorted, then a phone number can be searched in O(Logn) time using
Binary Search, but insert and delete operations become costly as we have to maintain sorted order.

With balanced binary search tree, we get moderate search, insert and delete times. All of these
operations can be guaranteed to be in O(Logn) time.

Another solution that one can think of is to use a direct access table where we make a big array and use
phone numbers as index in the array. An entry in array is NIL if phone number is not present, else
the array entry stores pointer to records corresponding to phone number. Time complexity wise this
solution is the best among all, we can do all operations in O(1) time. For example to insert a phone
number, we create a record with details of given phone number, use phone number as index and
store the pointer to the created record in table.
This solution has many practical limitations. First problem with this solution is extra space required
is huge. For example if phone number is n digits, we need O(m * 10n) space for table where m is
size of a pointer to record. Another problem is an integer in a programming language may not store
n digits.
Due to above limitations Direct Access Table cannot always be used. Hashing is the solution that can be
used in almost all such situations and performs extremely well compared to above data structures
like Array, Linked List, Balanced BST in practice. With hashing we get O(1) search time on
average (under reasonable assumptions) and O(n) in worst case.
Hashing is an improvement over Direct Access Table. The idea is to use hash function that converts a
given phone number or any other key to a smaller number and uses the small number as index in a
table called hash table.

Hash Function:
A function that converts a given big phone number to a small practical integer value. The mapped
integer value is used as an index in hash table. In simple terms, a hash function maps a big number or
string to a small integer that can be used as index in hash table.

(Hash Function ‘H’ will be applicable on key K to generate a memory address L, i.e, H:K  L)
A good hash function should have following properties
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely for each key)
For example for phone numbers a bad hash function is to take first three digits. A better function is
consider last three digits. Please note that this may not be the best hash function. There may be
better ways.

Different Hash functions are:


 Division Method
A key is mapped into one of m slots using the function
h(k) = k mod m
Requires only a single division, hence fast

 Midsquare Method
Mid-Square hashing is a hashing technique in which unique keys are generated. In this technique, a seed
value is taken and it is squared. Then, some digits from the middle are extracted. These extracted digits
form a number which is taken as the new seed. This technique can generate keys with high randomness if
a big enough seed value is taken. However, it has a limitation. As the seed is squared, if a 6-digit number
is taken, then the square will have 12-digits. This exceeds the range of int data type. So, overflow must
be taken care of. In case of overflow, use long long int data type or use string as multiplication if
overflow still occurs. The chances of a collision in mid-square hashing are low, not obsolete. So, in the
chances, if a collision occurs, it is handled using some hash map.
Example:
Suppose a 4-digit seed is taken. seed = 4765
Hence, square of seed is = 4765 * 4765 = 22705225
Now, from this 8-digit number, any four digits are extracted (Say, the middle four).
So, the new seed value becomes seed = 7052
Now, square of this new seed is = 7052 * 7052 = 49730704
Again, the same set of 4-digits is extracted.
So, the new seed value becomes seed = 7307
.
.
.
.
This process is repeated as many times as a key is required.

 Folding Method
The key is divided into several parts which are folded together There are two type of folding: Shift
and boundary.
Shift folding
Divide the number into parts and add(or other function) them together

123
456
789
1368 mod the Tsize.

Boundary folding,
Alternate pieces are flipped on the boundary.

1. 123 321
2. 654 or 456
3. 789 987
4. ---- ----
1566 1764

Hash Table: An array that stores pointers to records corresponding to a given phone number. An entry
in hash table is NIL if no existing phone number has hash function value equal to the index for the
entry.

Collision Handling: Since a hash function gets us a small number for a big key, there is possibility
that two keys result in same value. The situation where a newly inserted key maps to an
already occupied slot in hash table is called collision and must be handled using some collision
handling technique. Following are the ways to handle collisions:

 Chaining: The idea is to make each cell of hash table point to a linked list of records that have same
hash function value. Chaining is simple, but requires additional memory outside the table.
 Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table
entry contains either a record or NIL. When searching for an element, we one by one examine table
slots until the desired element is found or it is clear that the element is not in the table.

What is Collision?
Since a hash function gets us a small number for a key which is a big integer or string, there is possibility
that two keys result in same value. The situation where a newly inserted key maps to an already occupied
slot in hash table is called collision and must be handled using some collision handling technique.

H:K1 L , H:K2 L

 What are the chances of collisions with large table?


Collisions are very likely even if we have big table to store keys. An important observation
is Birthday Paradox. With only 23 persons, the probability that two people have same birthday is
50%.
How to handle Collisions?
There are mainly two methods to handle collision:
1) Separate Chaining
2) Open Addressing

Separate Chaining:
The idea is to make each cell of hash table point to a linked list of records that have same hash function value.
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.

 Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to chain
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how frequently keys may be inserted or
deleted.
 Disadvantages:
1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing
provides better cache performance as everything is stored in same table.
2) Wastage of Space (Some Parts of hash table are never used)
3) If the chain becomes long, then search time can become O(n) in worst case.
4) Uses extra space for links.

Performance of Chaining:
Performance of hashing can be evaluated under the assumption that each key is equally likely to be
hashed to any slot of table (simple uniform hashing).
m = Number of slots in hash table
n = Number of keys to be inserted in hash table
Load factor α = n/m
Expected time to search = O(1 + α)
Expected time to insert/delete = O(1 + α)
Time complexity of search insert and delete is
O(1) if α is O(1)
Open Addressing
Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all
elements are stored in the hash table itself. So at any point, size of the table must be greater than or
equal to the total number of keys (Note that we can increase table size by copying old data if
needed).
Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.
Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached.
Delete(k): Delete operation is interesting. If we simply delete a key, then search may fail. So slots of deleted
keys are marked specially as “deleted”.
Insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.

Open Addressing is done following ways:


a) Linear Probing: In linear probing, we linearly probe for next slot. For example, typical gap
between two probes is 1 as taken in below example also.
let hash(x) be the slot index computed using hash function and S be the table size

If slot hash(x) % S is full, then we try (hash(x) + 1) % S


If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S
If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S
..................................................
..................................................
 Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92,
73, 101.

Clustering: The main problem with linear probing is clustering, many consecutive elements form
groups and it starts taking time to find a free slot or to search an element.
b) Quadratic Probing We look for i2‘th slot in i’th iteration.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
..................................................
..................................................

c) Double Hashing We use another hash function hash2(x) and look for i*hash2(x) slot in i’th
rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
..................................................
..................................................

Comparison of above three:


Linear probing has the best cache performance but suffers from clustering. One more advantage of
Linear probing is easy to compute.
Quadratic probing lies between the two in terms of cache performance and clustering.
Double hashing has poor cache performance but no clustering. Double hashing requires more
computation time as two hash functions need to be computed.
S.No. Separate Chaining Open Addressing

Open Addressing requires more


1. Chaining is Simpler to implement. computation.

In chaining, Hash table never fills


up, we can always add more In open addressing, table may
2. elements to chain. become full.

Open addressing requires extra care


Chaining is Less sensitive to the for to avoid clustering and load
3. hash function or load factors. factor.

Chaining is mostly used when it is


unknown how many and how Open addressing is used when the
frequently keys may be inserted or frequency and number of keys is
4. deleted. known.

Cache performance of chaining is Open addressing provides better


not good as keys are stored using cache performance as everything is
5. linked list. stored in the same table.

Wastage of Space (Some Parts of In Open addressing, a slot can be


hash table in chaining are never used even if an input doesn’t map
6. used). to it.

Chaining uses extra space for


7. links. No links in Open addressing
Performance of Open Addressing:
Like Chaining, the performance of hashing can be evaluated under the assumption that each key is
equally likely to be hashed to any slot of the table (simple uniform hashing)
 m = Number of slots in the hash table
 n = Number of keys to be inserted in the hash table
 Load factor α = n/m ( < 1 )
 Expected time to search/insert/delete < 1/(1 - α)
 So Search, Insert and Delete take (1/(1 - α)) time
5.6 STORAGEMANAGEMENT

5.6.1 Garbage Collection

Garbage collection (GC) is a form of automatic memory management. The garbage collector
attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the
program.

Garbage collection is the opposite of manual memory management, which requires the
programmer to specify which objects to deallocate and return to the memory system. Like other
memory management techniques, garbage collection may take a significant proportion of total
processing time in a program and can thus have significant influence on performance.

Resources other than memory, such as network sockets, database handles, user interaction
windows, and file and device descriptors, are not typically handled by garbage collection.
Methods used to manage such resources, particularly destructors, may suffice to manage memory
as well, leaving no need for GC. Some GC systems allow such other resources to be associated
with a region of memory that, when collected, causes the other resource to be reclaimed; this is
called finalization. The basic principles of garbage collection are:

 Find data objects in a program that cannot be accessed in thefuture.


 Reclaim the resources used by thoseobjects.

Many programming languages require garbage collection, either as part of the language
specification or effectively for practical implementation these are said to be garbage collected
languages. Other languages were designed for use with manual memory management, but have
garbage collected implementations available (for example, C, C++). While integrating garbage
collection into the language's compiler and runtime system enables a much wider choice of
methods. The garbage collector will almost always be closely integrated with the memory
allocator.

Advantages

Garbage collection frees the programmer from manually dealing with memory deallocation. As a
result, certain categories of bugs are eliminated or substantially reduced:

 Dangling pointer bugs, which occur when a piece of memory is freed while there are still
pointers to it, and one of those pointers is dereferenced. By then the memory may have
been reassigned to another use, with unpredictableresults.
 Double free bugs, which occur when the program tries to free a region of memory that
has already been freed, and perhaps already been allocatedagain.
 Certain kinds of memory leaks, in which a program fails to free memory occupied by
objects that have become unreachable, which can lead to memoryexhaustion.
 Efficient implementations of persistent datastructures
Disadvantages

Typically, garbage collection has certain disadvantages:

 Garbage collection consumes computing resources in deciding which memory to free,


even though the programmer may have already known this information. The penalty for
the convenience of not annotating object lifetime manually in the source code is
overhead, which can lead to decreased or uneven performance. Interaction with memory
hierarchy effects can make this overhead intolerable in circumstances that are hard to
predict or to detect in routinetesting.
 The moment when the garbage is actually collected can be unpredictable, resulting in
stalls scattered throughout a session. Unpredictable stalls can be unacceptable in real-time
environments, in transaction processing, or in interactiveprograms.

5.6.2 Compaction

The process of moving all marked nodes to one end of memory and all available memory to
other end is called compaction. Algorithm which performs compaction is called compacting
algorithm.
After repeated allocation and de allocation of blocks, the memory becomes fragmented.
Compaction is a technique that joins the non contiguous free memory blocks to form one large
block so that the total free memory becomes contiguous.

All the memory blocks that are in use are moved towards the beginning of the memory i.e. these
blocks are copied into sequential locations in the lower portion of the memory.
When compaction is performed, all the user programs come to a halt. A problem can arise if any
of the used blocks that are copied contain a pointer value. Eg. Suppose inside block P5, the
location 350 contains address 310. After compaction the block P5 is moved from location 290 to
location 120, so now the pointer value 310 stored inside P5 should change to 140. So after
compaction the pointer values inside blocks should be identified and changed accordingly.

Potrebbero piacerti anche