Sei sulla pagina 1di 53

UNIT-5

SEARCHING

Search is a process of finding a value in a list of values. In other words, searching is the process of locating given value position in a list of values.

Linear Search (Sequential Search)

Linear search or sequential search is a method for finding a particular value in a list that consists of checking every one of its elements, one at a time and in sequence, until the desired one is found. Linear search is the simplest search algorithm; it is a special case of brute-force search. Its worst case cost is proportional to the number of elements in the list; and so is its expected cost, if all list elements are equally likely to be searched for. Therefore, if the list has more than a few elements, other methods (such as binary search or hashing) will be faster, but they also impose additionalrequirements.

How Linear Search works

Linear search in an array is usually programmed by stepping up an index variable until it reaches the last index. This normally requires two comparisons for each list item: one to check whether the index has reached the end of the array, and another one to check whether the item has the desired value.

check whether the index has reached the end of the array, and another one to check

Linear Search Algorithm

1. Repeat For J = 1 toN

2. If (ITEM == A[J])Then

3. Print: ITEM found at location J

4. Return

[End of If]

[End of ForLoop]

5. If (J > N)Then

6. Print: ITEM doesn’t exist

[End ofIf]

7. Exit

//CODE

int a[10],i,n,m,c=0, x;

printf("Enter the size of an array: "); scanf("%d",&n);

printf("Enter the elements of the array: ");

for(i=0;i<=n-1;i++){

scanf("%d",&a[i]);

}

printf("Enter the number to be search: "); scanf("%d",&m);

for(i=0;i<=n-1;i++){

if(a[i]==m){

}

x=I;

c=1;

break;

}

if(c==0)

printf("The number is not in the list"); else printf("The number is found at location %d", x);

}

Complexity of linear Search Linear search on a list of n elements. In the worst case, the search must visit every element once. This happens when the value being searched for is either the last element in the list, or is not in the list. However, on average, assuming the value searched for is in the list and each list element is equally likely to be the value searched for, the search visits only n/2 elements. In best case the array is already sorted i.e O(1)

Algorithm

Worst Case

Average Case

Best Case

Linear Search

O(n)

O(n)

O(1)

Binary Search

A binary search or half-interval search algorithm finds the position of a specified input value (the search "key") within an array sorted by key value. For binary search, the array should be arranged in ascending or descending order. In each step, the algorithm compares the search key value with the key value of the middle element of the array. If the keys match, then a matching element has been found and its index is returned. Otherwise, if the search key is less than the middle element's key, then the algorithm repeats its action on the sub-array to the left of the middle element or, if the search key is greater, on the sub-array to the right. If the remaining array to be searched is empty, then the key cannot be found in the array and a special "not found" indication isreturned.

How Binary Search Works

Searching a sorted collection is a common task. A dictionary is a sorted list of word definitions. Given a word, one can find its definition. A telephone book is a sorted list of people's names, addresses, and telephone numbers. Knowing someone's name allows one to quickly find their telephone number and address.

one to quickly find their telephone number and address. Binary Search Algorithm 1. Set BEG =

Binary Search Algorithm

1. Set BEG = 1 and END =N

2. Set MID = (BEG + END) /2

4.

If (ITEM < A[MID])Then

5. Set END = MID 1

6. Else

7. Set BEG = MID +1

[End ofIf]

8. Set MID = (BEG + END) /2

9. If (A[MID] == ITEM)Then

10. Print: ITEM exists at location MID

11. Else

12. Print: ITEM doesn’t exist

[End ofIf]

13. Exit

//CODE

intar[10],val,mid,low,high,size,i;

clrscr();

printf("\nenter the no.s of elements u wanna input in array\n");

scanf("%d",&size);

for(i=0;i<size;i++)

{

printf("input the element no %d\n",i+1);

scanf("%d",&ar[i]);

}

printf("the arry inputed is \n");

for(i=0;i<size;i++)

{

printf("%d\t",ar[i]);

}

low=0;

high=size-1;

printf("\ninput the no. u wanna search \n");

scanf("%d",&val);

while(val!=ar[mid]&&high>=low)

{

mid=(low+high)/2;

if(ar[mid]==val)

{

printf("value found at %d position",mid+1);

}

if(val>ar[mid])

{

low=mid+1;

}

else

{

high=mid-1;

}}

Complexity of Binary Search

A binary search halves the number of items to check with each iteration, so locating an item (or determining its absence) takes logarithmic time.

What does the time complexity O(log n) actually mean?

Complexities like O(1) and O(n) are simple and straightforward. O(1) means an operation which is done to reach an element directly (like a dictionary or hash table), O(n) means first we would have to search it by checking n elements, but what could O(log n) possibly mean?

Since binary search has a best case efficiency of O(1) and worst case (average case) efficiency of O(log n), we will look at an example of the worst case. Consider a sorted array of 16 elements.

A sorted array of 16 elements Selecting the middle element as pivot (length / 2)

A sorted array of 16 elements

A sorted array of 16 elements Selecting the middle element as pivot (length / 2) Since

Selecting the middle element as pivot (length / 2)

elements Selecting the middle element as pivot (length / 2) Since 13 is less than pivot,

Since 13 is less than pivot, we remove the other half of the array

13 is less than pivot, we remove the other half of the array Repeating the process

Repeating the process for finding the middle element for every sub-array

process for finding the middle element for every sub-array You can see that after every comparison
process for finding the middle element for every sub-array You can see that after every comparison

You can see that after every comparison with the middle term, our searching range gets divided into half of the current range.

So, for reaching one element from a set of 16 elements, we had to divide the array 4 times,

We can say that,

we had to divide the array 4 times, We can say that, Simplified Formula Similarly, for

Simplified Formula

Similarly, for n elements,

can say that, Simplified Formula Similarly, for n elements, Generalization Separating the power for the numerator

Generalization

that, Simplified Formula Similarly, for n elements, Generalization Separating the power for the numerator and denominator

Separating the power for the numerator and denominator

Multiplying both sides by 2^k Final result Now, let us look at the definition of

Multiplying both sides by 2^k

Multiplying both sides by 2^k Final result Now, let us look at the definition of logarithm,

Final result

Now, let us look at the definition of logarithm, it says that

A quantity representing the power to which a fixed number (the base) must be raised to produce a given number.

Which makes our equation into

to produce a given number. Which makes our equation into Algorithm Worst Case Average Case Best

Algorithm

Worst Case

Average Case

Best Case

Binary Search

O(logn)

O(logn)

O(1)

INTRODUCTION TO SORTING

Sorting is the process of arranging a list of elements in a particular order (Ascending or Descending). The importance of sorting lies in the fact that data searching can be optimized to a very high level, if data is stored in a sorted manner. Sorting is also used to represent data in more readable formats. Following are some of the examples of sorting in real-life scenarios −

Telephone Directory − The telephone directory stores the telephone numbers of people sorted by their names, so that the names can be searched easily.

Dictionary − The dictionary stores words in an alphabetical order so that searching of any word becomes easy.

Sorting Efficiency

If someone asks you, how will you arrange a deck of shuffled cards in order, You would say, You will start by checking every card, and making the deck as I move on. It can take hours to arrange the deck in order, but that's how you will do it. But, computers don't work like this.

Since the beginning of the programming age, computer scientists have been working on solving the problem of sorting by coming up with various different algorithms to sort data.

The two main criterias to judge which algorithm is better than the other have been:

1. Time taken to sort the given data.

TYPES OF SORTING Internal and External sorting

An internal sort is any data sorting process that takes place entirely within the main memory of a computer. This is possible whenever the data to be sorted is small enough to all be held in the main memory. Eg. Bubble Sort, Insertion sort

External sorting is a term for a class of sorting algorithms that can handle massive amounts of data. External sorting is required when the data being sorted do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive). External sorting typically uses a hybrid sort-merge strategy. In the sorting phase, chunks of data small enough to fit in main memory are read, sorted, and written out to a temporary file. In the merge phase, the sorted sub files are combined into a single larger file. Eg. Merge Sort

In-place Sorting and Not-in-place Sorting

Sorting algorithms may require some extra space for comparison and temporary storage of few data elements. These algorithms do not require any extra space and sorting is said to happen in- place, or for example, within the array itself. This is called in-place sorting. Bubble sort is an example of in-place sorting.

However, in some sorting algorithms, the program requires space which is more than or equal to the elements being sorted. Sorting which uses equal or more space is called not-in-place sorting. Merge-sort is an example of not-in-place sorting.

Stable and Unstable Sorting

We can say a sorting algorithm is stable if two objects with equal keys appear in the same order in sorted output as they appear in the input unsorted array.

objects with equal keys appear in the same order in sorted output as they appear in

Stable Sort

UnStable Sort Stability of an algorithm matters when we wish to maintain the sequence of

UnStable Sort

Stability of an algorithm matters when we wish to maintain the sequence of original elements, like in a tuple for example.

Adaptive and Non-Adaptive Sorting Algorithm

A sorting algorithm is said to be adaptive, if it takes advantage of already 'sorted' elements in the list that is to be sorted. That is, while sorting if the source list has some element already sorted, adaptive algorithms will take this into account and will try not to re-order them. (Presortedness of the input affects the running time). Eg. Quick Sort

A non-adaptive algorithm is one which does not take into account the elements which are already sorted. They try to force every single element to be re-ordered to confirm their sortedness.

Bubble Sort

Bubble Sort is an algorithm which is used to sort N elements that are given in a memory for eg:

an Array with N number of elements. Bubble Sort compares all the elements one by one and sort them based on their values. It is known as bubble sort, because with every complete iteration the largest element in the given array, bubbles up towards the last place or the highest index, just like a water bubble rises up to the water surface. Sorting takes place by stepping through all the data items one-by-one in pairs and comparing adjacent data items and swapping each pair that is out of order.

How Bubble Sort Works

Let us take the array of numbers "5 1 4 2 8", and sort the array from lowest number to greatest number using bubble sort. In each step, elements written in bold are being compared. Three passes will be required.

First Pass:

5 1 4 2 8 ) ( 1 5 4 2 8 ), Here, algorithm compares the first two elements, and swaps since 5 > 1.

(

the first two elements, and swaps since 5 > 1. ( ( 1 5 4 2

(

1 5 4 2 8 )

( 1 4 5 2 8 ), Swap since 5 >4( 1 5 4 2 8 )

(

1 4 5 2 8 )

( 1 4 2 5 8 ), Swap since 5 >2( 1 4 5 2 8 )

( 1 4 2 58 )

does not swap them.

( 1 4 2 5 8 ), Now, since these elements are already in order (8 > 5),algorithm( 1 4 2 58 ) does not swap them. ( 1 4 2 5 8

(

1 4 2 5 8 )

( 1 4 2 5 8 )( 1 4 2 5 8 )

(

1 4 2 5 8 )

( 1 2 4 5 8 ), Swap since 4 > 2( 1 4 2 5 8 )

(

1 2 4 5 8 )

( 1 2 4 5 8 )( 1 2 4 5 8 )

(

1 2 4 5 8 )

( 1 2 4 5 8 )( 1 2 4 5 8 )

Now, the array is already sorted, but our algorithm does not know if it is completed. The algorithm needs one whole pass without any swap to know it is sorted.

Third Pass:

 

( 1 2 4 5 8 )

( 1 2 4 5 8 )( 1 2 4 5 8 )

(

1 2 4 5 8 )

( 1 2 4 5 8 )( 1 2 4 5 8 )

(

1 2 4 5 8 )

( 1 2 4 5 8 )( 1 2 4 5 8 )

(

1 2 4 5 8 )

( 1 2 4 5 8 )( 1 2 4 5 8 )

Bubble Sort Algorithm

1. Repeat Step 2 and 3 for k=1 ton

2. Setptr=1

3. Repeat whileptr<n-k

a) If (A[ptr] > A[ptr+1]) Then

Interchange A[ptr] andA[ptr+1]

[End ofIf]

b) ptr=ptr+1

[end of step 3 loop]

[end of step 1 loop]

4. Exit

//CODE

Let's consider an array with values {5, 1, 6, 2, 4, 3} int a[6] = {5, 1, 6, 2, 4, 3}; int i, j, temp; for(i=0; i<6, i++)

{

for(j=0; j<6-i-1; j++)

{

if( a[j] >a[j+1])

{

temp = a[j];

a[j] = a[j+1]; a[j+1] =temp;

}

}

}

Above is the algorithm, to sort an array using Bubble Sort. Although the above logic will sort and unsorted array, still the above algorithm isn't efficient and can be enhanced further. Because as per the above logic, the for loop will keep going for six iterations even if the array gets sorted after the second iteration. Hence we can insert a flag and can keep checking whether swapping of elements is taking place or not. If no swapping is taking place that means the array is sorted and we can jump out of the for loop. So, improved algorithm for bubble sort is:

1. Repeat Step 2 and 3 for k=1 ton

2. Set ptr=1 and flag=0

3. Repeat whileptr<n-k

a.If (A[ptr] > A[ptr+1])

Then Interchange A[ptr]

andA[ptr+1]

flag=1

[End ofIf]

b. ptr=ptr+1

[end of step 3 loop]

4. if(flag==0) then

break

[end of step 1 loop]

5. Exit

int a[6] = {5, 1, 6, 2, 4, 3}; int i, j, temp; for(i=0; i<6, i++)

{

for(j=0; j<6-i-1; j++)

{

int flag=0; if( a[j] >a[j+1])

//taking a flag variable

 

{

temp = a[j]; a[j] = a[j+1]; a[j+1] =temp;

flag=1;

//setting flag as 1, if swappingoccurs

}

}

if(!flag)

//breaking out of for loop if no swapping takesplace

{

 

break;

}

}

In the above code, if in a complete single cycle of j iteration(inner for loop), no swapping takes place, and flag remains 0, then we will break out of the for loops, because the array has already been sorted.

Complexity of Bubble Sort Algorithm

In Bubble Sort, n-1 comparisons will be done in 1st pass, n-2 in 2nd pass, n-3 in 3rd pass and so on. So the total number of comparisons will be

F(n)=(n-1)+(n-2)+…………………………+2+1=n(n-1)/2 = O(n 2 )

Algorithm

Worst Case

Average Case

Best Case

Space Comlexity

Bubble Sort

n(n-1)/2 = O(n 2 )

n(n-1)/2 = O(n 2 )

O(n)

O(1)

Selection Sort

Selection sorting is conceptually the simplest sorting algorithm. This algorithm first finds the smallest element in the array and exchanges it with the element in the first position, then find the second smallest element and exchange it with the element in the second position, and continues in this way until the entire array is sorted

How Selection Sort works In the first pass, the smallest element found is 1, so it is placed at the first position, then leaving first element, smallest element is searched from the rest of the elements, 3 is the smallest, so it is then placed at the second position. Then we leave 1 and 3, from the rest of the elements, we search for the smallest and put it at third position and keep doing this, until array is sorted

Selection Sort Algorithm 1. Repeat For J = 0 toN-1 2. Set MIN =J 3.

Selection Sort Algorithm

1. Repeat For J = 0 toN-1

2. Set MIN =J

3. Repeat For K = J+1 toN

4. If (A[K] < A[MIN])Then

5. Set MIN = K

[End ofIf] [End of Step 3 For Loop]

6. Interchange A[J] andA[MIN]

[End of Step 1 ForLoop]

7. Exit

//CODE

void selectionSort(int a[], int size)

{

int i, j, min, temp;

for(i=0; i < size-1; i++ )

 

{

min = i; //setting min as i for(j=i+1; j < size; j++)

{

 

if(a[j] < a[min]) //if element at j is less than element at min position

{

min =j;

//then set min asj

}

 

}

temp = a[i];

 

a[i] = a[min]; a[min] = temp;

}

}

Complexity of Selection Sort Algorithm

The number of comparison in the selection sort algorithm is independent of the original order of the element. That is there are n-1 comparison during PASS 1 to find the smallest element, there are n-2 comparisons during PASS 2 to find the second smallest element, and so on. Accordingly

F(n)=(n-1)+(n-2)+…………………………+2+1=n(n-1)/2 = O(n 2 )

Algorithm

Worst Case

Average Case

Best Case

Space Comlexity

Selection Sort

n(n-1)/2 = O(n 2 )

n(n-1)/2 = O(n 2 )

O(n 2 )

O(1)

Insertion Sort

Consider you have 10 cards out of a deck of cards in your hand. And they are sorted, or arranged in the ascending order of their numbers.

If I give you another card, and ask you to insert the card in just the right position, so that the cards in your hand are still sorted. What will you do?

Well, you will have to go through each card from the starting or the back and find the right position for the new card, comparing it's value with each card. Once you find the right position, you will insert the card there.

Similarly, if more new cards are provided to you, you can easily repeat the same process and insert the new cards and keep the cards sorted too.

This is exactly how insertion sort works. It starts from the index 1(not 0), and each index starting from index 1 is like a new card, that you have to place at the right position in the sorted subarray on the left.

It is a simple sorting algorithm that builds the final sorted array (or list) one item at a time. This algorithm is less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, insertion sort provides several advantages:

Simple implementation

Efficient for small datasets

Stable; i.e., does not change the relative order of elements with equal keys

In-place; i.e., only requires a constant amount O(1) of additional memory space.

How Insertion Sort Works

Insertion Sort Algorithm This algorithm sorts the array A with N elements. 1. Set A[0]=-12345(infinity

Insertion Sort Algorithm

This algorithm sorts the array A with N elements.

1. Set A[0]=-12345(infinity i.e. Any largeno)

2. Repeat step 3 to 5 for k=2 to n

3. Set key=A[k] And j=k-1

4. Repeat while key<A[j]

A) SetA[j+1]=A[j]

b) j=j-1

5. Set A[j+1]=key

6. Return

//CODE

int A[6] = {5, 1, 6, 2, 4, 3}; int i, j, key; for(i=1; i<6; i++)

{

key = A[i];

j = i-1;

while(j>=0 && key < A[j])

{

A[j+1] = A[j]; j--;

}

A[j+1] = key;

}

Complexity of Insertion Sort

The number f(n) of comparisons in the insertion sort algorithm can be easily computed. First of all, the worst case occurs when the array A is in reverse order and the inner loop must use the maximum number K-1 of comparisons. Hence

F(n)= 1+2+3+……………………………….+(n-1)=n(n-1)/2= O(n 2 ) Furthermore, One can show that, on the average, there will be approximately (K-1)/2 comparisons in the inner loop. Accordingly, for the average case. F(n)=O(n 2 ) Thus the insertion sort algorithm is a very slow algorithm when n is very large.

Algorithm

Worst Case

Average Case

Best Case

Space Comlexity

Insertion Sort

n(n-1)/2 = O(n 2 )

n(n-1)/4 = O(n 2 )

O(n)

O(1)

Merge Sort

Merge Sort follows the rule of Divide and Conquer. But it doesn't divide the list into two halves. In merge sort the unsorted list is divided into N sub lists, each having one element, because a list of one element is considered sorted. Then, it repeatedly merge these sub lists, to produce new sorted sub lists, and at lasts one sorted list is produced. The concept of Divide and Conquer involves three steps:

1. Divide the problem into multiple small problems.

2. Conquer the subproblems by solving them. The idea is to break down the problem into atomic subproblems, where they are actually solved.

3. Combine the solutions of the subproblems to find the solution of the actual problem.

the subproblems to find the solution of the actual problem. Merge Sort is quite fast, and

Merge Sort is quite fast, and has a time complexity of O(n log n). It is also a stable sort, which means the equal elements are ordered in the same order in the sorted list.

How Merge Sort Works

Suppose the array A contains 8 elements, each pass of the merge-sort algorithm will start at the beginning of the array A and merge pairs of sorted subarrays as follows.

PASS 1. Merge each pair of elements to obtain the list of sorted pairs.

PASS 2. Merge each pair of pairs to obtain the list of sorted quadruplets.

PASS 3. Merge each pair of sorted quadruplets to obtain the two sorted subarrays.

PASS 4. Merge the two sorted subarrays to obtain the single sorted array.

the two sorted subarrays to obtain the single sorted array. Merge Sort Algorithm /* Sorting using

Merge Sort Algorithm

/* Sorting using Merge Sort Algorithm a[] is the array, p is starting index, that is0, and r is the last index of array.*/ Lets take a[5] = {32, 45, 67, 2, 7} as the array to besorted. void mergesort(int a[], int p, intr)

{

int q;

if(p < r)

{

q = floor( (p+r) / 2); mergesort(a, p, q); mergesort(a, q+1, r);

merge(a, p, q, r);

}

}

void merge (int a[], int p, int q, int r)

{

intb[5];

//same size ofa[]

int i, j, k; k = 0;

i = p;

j = q+1;

while(i <= q && j <= r)

{

if(a[i] < a[j])

{

b[k++] =a[i++];

// same as b[k]=a[i]; k++;i++;

}

else

{

b[k++] = a[j++];

}

}

while(i <= q)

{

b[k++] = a[i++];

}

while(j <= r)

{

b[k++] = a[j++];

}

for(i=r; i >= p; i--)

{

 

a[i] =b[--k];

// copying back the sorted list toa[]

}

}

Complexity of Merge Sort Algorithm

Merge Sort is quite fast, and has a time complexity of O(n*log n). It is also a stable sort, which means the "equal" elements are ordered in the same order in the sorted list. In this section we will understand why the running time for merge sort is O(n*log n). As we have already learned in Binary Search that whenever we divide a number into half in every stpe, it can be represented using a logarithmic function, which is log n and the number of steps can be represented by log n + 1(at most) Also, we perform a single step operation to find out the middle of any subarray, i.e. O(1). And to merge the subarrays, made by dividing the original array of n elements, a running time of O(n) will be required. Hence the total time for mergeSort function will become n(log n + 1), which gives us a time complexity of O(n*log n).

Time complexity of Merge Sort is O(n*Log n) in all the 3 cases (worst, average and best) as merge sort always divides the array in two halves and takes linear time to merge two halves.

It requires equal amount of additional space as the unsorted array. Hence its not at all recommended for searching large unsorted arrays.

Algorithm

Worst Case

Average

Best Case

Space Complexity

Case

Merge Sort

O(n logn)

O(n logn)

O(n logn)

O(n)

QuickSort

Quick Sort is also based on the concept of Divide and Conquer, just like merge sort. But in quick sort all the heavy lifting(major work) is done while dividing the array into subarrays, while in case of merge sort, all the real work happens during merging the subarrays. In case of quick sort, the combine step does absolutely nothing. It is also called partition-exchange sort. This algorithm divides the list into three main parts:

1. Elements less than the Pivot element

2. Pivot element(Central element)

3. Elements greater than the pivot element Pivot element can be any element from the array, it can be the first element, the last element or any random element. In this tutorial, we will take the leftmost element or the first element as pivot. Figure shows that 54 will serve as our first pivot value. The partition process will happen next. It will find the split point and at the same time move other items to the appropriate side of the list, either less than or greater than the pivot value.

the list, either less than or greater than the pivot value. Partitioning begins by locating two

Partitioning begins by locating two position markers—let’s call them leftmark and rightmarkat the beginning and end of the remaining items in the list (positions 1 and 8 in Figure). The goal of the partition process is to move items that are on the wrong side with respect to the pivot value while also converging on the split point. Figure below shows this process as we locate the position of 54.

We begin by incrementing leftmark until we locate a value that is greater than the

We begin by incrementing leftmark until we locate a value that is greater than the pivot value. We then decrement rightmark until we find a value that is less than the pivot value. At this point we have discovered two items that are out of place with respect to the eventual split point. For our example, this occurs at 93 and 20. Now we can exchange these two items and then repeat the process again.

At the point where rightmark becomes less than leftmark, we stop. The position of rightmark is now the split point. The pivot value can be exchanged with the contents of the split point and the pivot value is now in place (Figure below). In addition, all the items to the left of the split point are less than the pivot value, and all the items to the right of the split point are greater than the pivot value. The list can now be divided at the split point and the quick sort can be invoked recursively on the two halves.

value. The list can now be divided at the split point and the quick sort can

Quick Sort Algorithm

Algo QUICKSORT (A, p, r)

{

if p <r

{

then q ← PARTITION (A, p, r)

QUICKSORT (A, p, q -1) QUICKSORT (A, q + 1,r)

}

}

The key to the algorithm is the PARTITION procedure, which rearranges the subarray A[p r] in

place.

Algo PARTITION (A, p, r)

{

pivot=A[p]

i=p, j=r

while(i<j)

{

while(A[i]<=pivot && i<j)

{

 

i=i+1

}

while(A[j]>pivot)

{

 

end=end-1

}

if(i<j)

{

 

swap(A[i],A[j])

}

}

A[p]=A[j]

A[j]=pivot

Return j

}

//CODE /* Sorting using Quick Sort Algorithm a[] is the array, p is starting index, that is 0, and r is the last index of array.*/ void quicksort(int a[], int p, int r)

{

if(p < r)

{

int q; q = partition(a, p, r);

quicksort(a, p, q-1); quicksort(a, q+1, r);

}

}

int partition (int a[], int p, int r)

{

int i, j, pivot, temp; pivot = a[p];

i = p;

j = r;

while(1)

{

while(a[i] < pivot && a[i] != pivot) i++; while(a[j] > pivot && a[j] != pivot) j--; if(i < j)

{

temp = a[i]; a[i] = a[j]; a[j] = temp;

}

else

{

Return j; }}} Complexity of Quick Sort Algorithm

Worst Case: The worst case occurs when the partition process always picks greatest or smallest element as pivot. If we consider above partition strategy where last element is always picked as pivot, the worst case would occur when the array is already sorted in increasing or decreasing order. F(n)= n+(n-1)+(n-2)+…………………………+2+1=n(n+1)/2 = O(n 2 )

Best Case: The best case occurs when the partition process always picks the middle element as pivot. Following is recurrence for best case.

T(n) = 2T(n/2) +

(n)

The solution of above recurrence is

(nLogn)

Algorithm

Worst Case

Average Case

Best Case

Quick Sort

n(n+1)/2 = O(n 2 )

O(n logn)

O(n logn)

Is QuickSort stable? The default implementation is not stable. Why Quick Sort is preferred over MergeSort for sorting Arrays? Quick Sort in its general form is an in-place sort (i.e. it doesn’t require any extra storage) whereas merge sort requires O(N) extra storage, N denoting the array size which may be quite expensive. Allocating and de-allocating the extra space used for merge sort increases the running time of the

algorithm. Comparing average complexity we find that both type of sorts have O(NlogN) average complexity but the constants differ. For arrays, merge sort loses due to the use of extra O(N) storage space.

Heap Sort

Heap sort is a comparison based sorting technique based on Binary Heap data structure. So, we will first see what a heap tree is and how basic operations such as deletion and insertion are performed on it?

A Binary Heap is a Complete Binary Tree where items are stored in a special order such that value in a

parent node is greater(or smaller) than the values in its two children nodes. The former is called as max heap and the latter is called min heap. The heap can be represented by binary tree or array.

heap. The heap can be represented by binary tree or array. Every heap data structure has
heap. The heap can be represented by binary tree or array. Every heap data structure has

Every heap data structure has the following properties

Property #1 (Ordering): Nodes must be arranged in a order according to values based on Max heap or Min heap.

Property #2 (Structural): All levels in a heap must full, except last level and nodes must be filled from left to right strictly.

Max Heap

Max heap data structure is a specialized full binary tree data structure except last leaf node can be alone.

In a max heap nodes are arranged based on node value.

Max heap is defined as follows

Max heap is a specialized full binary tree in which every parent node contains greater or equal value than its child nodes. And last leaf node can be alone.

Example

Example Above tree is satisfying both Ordering property and Structural property according to the Max Heap

Above tree is satisfying both Ordering property and Structural property according to the Max Heap data structure.

Operations on Max Heap

The following operations are performed on a Max heap data structure

1. Finding Maximum

2. Insertion

3. Deletion

Finding Maximum Value Operation in Max Heap Finding the node which has maximum value in a max heap is very simple. In max heap, the root node has the maximum value than all other nodes in the max heap. So, directly we can display root node value as maximum value in max heap.

Insertion Operation in Max Heap

Insertion Operation in max heap is performed as follows

Step 1: Insert the newNode as last leaf from left to right.

Step 2: Compare newNode value with its Parent node.

Step 3: If newNode value is greater than its parent, then swap both of them.

Step 4: Repeat step 2 and step 3 until newNode value is less than its parent nede (or) newNode reached to root.

Example Consider the above max heap. Insert a new node with value 85.

Step 1: Insert the newNode with value 85 as last leaf from left to right. That means newNode is added as a right child of node with value 75. After adding max heap is as follows

of node with value 75. After adding max heap is as follows  Step 2: Compare

Step 2: Compare newNode value (85) with its Parent node value (75). That means 85 > 75

 Step 3: Here newNode value (85) is greater than its parent value (75), then

Step 3: Here newNode value (85) is greater than its parent value (75), then swap both of them. After wsapping, max heap is as follows

-
-

Step 4: Now, again compare newNode value (85) with its parent nede value (89).

compare newNode value (85) with its parent nede value (89). Here, newNode value (85) is smaller

Here, newNode value (85) is smaller than its parent node value (89). So, we stop insertion process. Finally, max heap after insetion of a new node with value 85 is as follows

Deletion Operation in Max Heap In a max heap, deleting last node is very simple

Deletion Operation in Max Heap In a max heap, deleting last node is very simple as it is not disturbing max heap properties.

Deleting root node from a max heap is title difficult as it disturbing the max heap properties. We use the following steps to delete root node from a max heap

Step 1: Swap the root node with last node in max heap

Step 2: Delete last node.

Step 3: Now, compare root value with its left child value.

Step 4: If root value is smaller than its left child, then compare left child with its right sibling. Else goto Step 6

Step 5: If left child value is larger than its right sibling, then swap root with left child. Otherwise swap root with its right child.

Step 6: If root value is larger than its left child, then compare root value with its right child value.

Step 7: If root value is smaller than its right child, then swap root with rigth child. otherwise stop the process.

Step 8: Repeat the same until root node is fixed at its exact position.

Example Consider the above max heap. Delete root node (90) from the max heap.

Step 1: Swap the root node (90) with last node 75 in max heap After swapping max heap is as follows

node 75 in max heap After swapping max heap is as follows  Step 2: Delete

Step 2: Delete last node. Here node with value 90. After deleting node with value 90 from heap, max heap is as follows

 Step 3: Compare root node (75) with its left child (89). Here, root value

Step 3: Compare root node (75) with its left child (89).

 Step 3: Compare root node (75) with its left child (89). Here, root value (75)

Here, root value (75) is smaller than its left child value (89). So, compare left child (89) with its right sibling (70).

So, compare left child (89) with its right sibling (70).  Step 4: Here, left child

Step 4: Here, left child value (89) is larger than its right sibling (70), So, swap root (75) with left child

(89).

right sibling (70), So, swap root (75) with left child (89).  Step 5: Now, again

Step 5: Now, again compare 75 with its left child (36).

Here, node with value 75 is larger than its left child. So, we compare node

Here, node with value 75 is larger than its left child. So, we compare node with value 75 is compared with its right child 85.

node with value 75 is compared with its right child 85.  Step 6: Here, node

Step 6: Here, node with value 75 is smaller than its right child (85). So, we swap both of them. After swapping max heap is as follows

we swap both of them. After swapping max heap is as follows  Step 7: Now,

Step 7: Now, compare node with value 75 with its left child (15).

7: Now, compare node with value 75 with its left child (15). Here, node with value

Here, node with value 75 is larger than its left child (15) and it does not have right child. So we stop the process.

Finally, max heap after deleting root node (90) is as follows

Finally, max heap after deleting root node (90) is as follows Heap Sort is one of

Heap Sort is one of the best sorting methods being in-place and with no quadratic worst-case scenarios. Heap sort algorithm is divided into two basic parts:

1. Creating a Heap of the unsorted list.

2. Then a sorted array is created by repeatedly removing the largest/smallest element from the heap, and inserting it into the array. The heap is reconstructed after each removal.

How Heap Sort Works

Initially on receiving an unsorted list, the first step in heap sort is to create a Heap data structure (Max-Heap or Min-Heap). Once heap is built, the first element of the Heap is either largest or smallest (depending upon Max-Heap or Min-Heap), so we put the first element of the heap in our array. Then we again make heap using the remaining elements, to again pick the first element of the heap and put it into the array. We keep on doing the same repeatedly until we have the complete sorted list in our array.

An Example of Heapsort:

Given an array of 6 elements: 15, 19, 10, 7, 17, 16, sort it in ascending order using heap sort.

Steps:

1.

Consider the values of the elements as priorities and build the heap tree.

 

2.

Start deleteMin operations, storing each deleted element at the end of the heap array.

 

After performing step 2, the order of the elements will be opposite to the order in the heap tree. Hence, if we want the elements to be sorted in ascending order, we need to build the heap tree in descending order - the greatest element will have the highest priority.

Note that we use only one array , treating its parts differently:

 

a.

when

building

the

heap

tree,

part

of

the

array

will

be

considered

as

the

heap,

and the rest part - the original array.

 

b.

when sorting, part of the array will be the heap, and the rest part - the sorted array.

 

This will be indicated by colors: white for the original array, blue for the heap and red for the sorted array

Here is the array: 15, 19, 10, 7, 17, 6

 

A.

Building the heap tree

 

The array represented as a tree, complete but not ordered:

- It has one greater child and has to be percolated down: Start with the

-

It has one greater child and has to be percolated down:

Start with the rightmost node at height

1

the

node

at

position

3

=

Size/2.

rightmost node at height 1 the node at position 3 = Size/2. After processing array[3] the

After processing array[3] the situation is:

3 = Size/2. After processing array[3] the situation is: Next comes array[2]. Its children are smaller,

Next comes array[2]. Its children are smaller, so no percolation is needed.

Its children are smaller, so no percolation is needed. The last node to be processed is

The last node to be processed is array[1].

Its left

child is

the

greater of the children.

The item at array[1] has to be percolated down to the left, swapped with array[2].

to be percolated down to the left, swapped with array[2]. As a result the situation is:

As a result the situation is:

left, swapped with array[2]. As a result the situation is: The children of array[2] are greater,

The children of array[2] are greater, and item 15 has to be moved down further, swapped with

array[5].

item 15 has to be moved down further, swapped with array[5]. Now the tree is ordered,

Now the tree is ordered, and the binary heap is built.

B. Sorting - performing deleteMax operations:

1. Delete the top element 19. 1.1. Store 19 in a temporary place. A hole is created at the top

1.2. Swap 19 with the last element of the heap. As Instead it becomes a

1.2. Swap 19 with the last element of the heap.

As

Instead it becomes a cell from the sorted array

10

will

be

adjusted

in

the

heap,

its

cell

will

no

longer

be

a

part

of

the

heap.

in the heap, its cell will no longer be a part of the heap. 1.3. Percolate

1.3. Percolate down the hole

longer be a part of the heap. 1.3. Percolate down the hole 1.4. Percolate once more

1.4. Percolate once more (10 is less that 15, so it cannot be inserted in the previous hole)

Now 10 can be inserted in the hole 2. DeleteMax the top element 17 2.1.

Now 10 can be inserted in the hole

Now 10 can be inserted in the hole 2. DeleteMax the top element 17 2.1. Store

2. DeleteMax the top element 17

2.1. Store 17 in a temporary place. A hole is created at the top

Store 17 in a temporary place. A hole is created at the top 2.2. Swap 17

2.2. Swap 17 with the last element of the heap.

As

Instead it becomes a cell from the sorted array

10

will

be

adjusted

in

the

heap,

its

cell

will

no

longer

be

a

part

of

the

heap.

2.3. The element 10 is less than the children of the hole, and we percolate

2.3. The element 10 is less than the children of the hole, and we percolate the hole down:

the children of the hole, and we percolate the hole down: 2.4. Insert 10 in the

2.4. Insert 10 in the hole

and we percolate the hole down: 2.4. Insert 10 in the hole 3. DeleteMax 16 3.1.

3. DeleteMax 16

3.1. Store 16 in a temporary place. A hole is created at the top

3.2. Swap 16 with the last element of the heap. As Instead it becomes a

3.2. Swap 16 with the last element of the heap.

As

Instead it becomes a cell from the sorted array

7

will

be

adjusted

in

the

heap,

its

cell

will

no

longer

be

a

part

of

the

heap.

in the heap, its cell will no longer be a part of the heap. 3.3. Percolate

3.3. Percolate the hole down (7 cannot be inserted there - it is less than the children of the hole)

inserted there - it is less than the children of the hole) 3.4. Insert 7 in

3.4. Insert 7 in the hole

than the children of the hole) 3.4. Insert 7 in the hole 4. DeleteMax the top

4. DeleteMax the top element 15

4.2. Swap 15 with the last element of the heap. As Instead it becomes a

4.2. Swap 15 with the last element of the heap.

As

Instead it becomes a position from the sorted array

10

will

be

adjusted

in

the

heap,

its

cell

will

no

longer

be

a

part

of

the

heap.

in the heap, its cell will no longer be a part of the heap. 4.3. Store

4.3. Store 10 in the hole (10 is greater than the children of the hole)

10 in the hole (10 is greater than the children of the hole) 5. DeleteMax the

5. DeleteMax the top element 10.

5.1. Remove 10 from the heap and store it into a temporary location.

10 from the heap and store it into a temporary location. 5.2. Swap 10 with the

5.2. Swap 10 with the last element of the heap.

As 7 will be adjusted in the heap, its cell will no longer be a part of the heap. Instead it becomes a cell from the sorted array

5.3. Store 7 in the hole (as the only remaining element in the heap 7

5.3. Store 7 in the hole (as the only remaining element in the heap

7 in the hole (as the only remaining element in the heap 7 is the last

7 is the last element from the heap, so now the array is sorted

the last element from the heap, so now the array is sorted Heap Sort Algorithm •

Heap Sort Algorithm

HEAPSORT(A)

1. BUILD-MAX-HEAP(A)

2. for i ← length[A] down to 2

3. do exchange A[1] ↔ A[i]

4. heap-size[A] ← heap-size[A] 1

5. MAX-HEAPIFY(A,1)

BUILD-MAX-HEAP(A)

1. heap-size[A] ← length[A]

2. for i ← length[A]/2 down to 1

3. do MAX-HEAPIFY(A, i)

MAX-HEAPIFY(A, i)

1. l ← LEFT(i)

2. r ← RIGHT(i)

3. if l ≤ heap-size[A] and A[l] > A[i]

4. then largest←l

5. else largest←i

7.

then largest←r

8. if largest =i

9. then exchange A[i ] ↔A[largest]

10. MAX-HEAPIFY(A,largest)

//CODE

In the below algorithm, initially heapsort() function is called, which calls buildmaxheap() to build heap, which inturn uses maxheap() to build the heap.

void heapsort(int[], int); void buildmaxheap(int [], int); void maxheap(int [], int, int);

void main()

{

int a[10], i, size; printf("Enter sizeoflist"); scanf(“%d”,&size); printf( "Enter" elements"); for( i=0; i < size; i++)

{

Scanf(“%d”,&a[i]);

}

heapsort(a, size);

getch();

}

// less than 10, because max size of array is 10

void heapsort (int a[], int length)

{

 

buildmaxheap(a, length); int heapsize, i, temp; heapsize = length - 1;

for( i=heapsize; i >= 0; i--)

{

temp = a[0]; a[0] = a[heapsize]; a[heapsize] = temp; heapsize--; maxheap(a, 0, heapsize);

}

for( i=0; i < length; i++)

{

Printf("\t%d" ,a[i]);

}

}

void buildmaxheap (int a[], int length)

{

int i, heapsize; heapsize = length -

1;

for( i=(length/2); i >= 0; i--)

{

maxheap(a, i, heapsize);

}

}

void maxheap(int a[], int i, int heapsize)

{

int l, r, largest, temp;

l

= 2*i;

r

= 2*i + 1;

if(l <= heapsize && a[l] > a[i])

{

largest = l;

}

else

{

largest = i;

}

if( r <= heapsize && a[r] > a[largest])

{

largest = r;

}

if(largest != i)

{

temp = a[i]; a[i] = a[largest]; a[largest] = temp; maxheap(a, largest, heapsize);

}

}

Complexity of Heap Sort Algorithm

The heap sort algorithm is applied to an array A with n elements. The algorithm has two phases, and we analyze the complexity of each phase separately. Phase 1. Suppose H is a heap. The number of comparisons to find the appropriate place of a new element item in H cannot exceed the depth of H. Since H is complete tree, its depth is bounded by log 2 m where m is the number of elements in H. Accordingly, the total number g(n) of comparisons to insert the n elements of A into H is boundedas

g(n) ≤ n log 2 n Phase 2. If H is a complete tree with m elements, the left and right subtrees of H are heaps and L

is the root of H Reheaping uses 4 comparisons to move the node L one step down the tree H.

Since the depth cannot exceeds log 2 m , it uses 4log 2 m comparisons to find the appropriate place

of L in the tree H.

h(n)≤4nlog 2 n

sThus each phase requires time proportional to nlog 2 n, the running time to sort n elements array

A would be nlog 2 n

Algorithm

Worst Case

Average Case

Best Case

Heap Sort

O(n logn)

O(n logn)

O(n logn)

Radix Sort

Radix sort is an integer sorting algorithm that sorts data with integer keys by grouping the keys by

sort

uses counting sort as a subroutine to sort an array of numbers. Because integers can be used to represent strings (by hashing the strings to integers), radix sort works on data types other than just integers. Because radix sort is not comparison based, it is not bounded by for running time in fact, radix sort can perform in linear time.

individual

digits

that

share

the

same

significant

position

and

value

Radix

Radix sort incorporates the counting sort algorithm so that it can sort larger, multi-digit numbers without having to potentially decrease the efficiency by increasing the range of keys the algorithm must sort over (since this might cause a lot of wasted time).

by increasing the range of keys the algorithm must sort over (since this might cause a

Counting Sort

by increasing the range of keys the algorithm must sort over (since this might cause a
by increasing the range of keys the algorithm must sort over (since this might cause a

Counting sort can only sort one place value of a given base. For example, a counting sort for base-10 numbers can only sort digits zero through nine. To sort two-digit numbers, counting sort would need to operate in base-100. Radix sort is more powerful because it can sort multi-digit numbers without having to search over a wider range of keys (which would happen if the base was larger).

Where it(counting-sort) Fails? when elements reaches in the range from 1 to n 2 as in that case it will take O(n 2 ) which is worst than the above mentioned sorting algorithms.

can we do better

The answer is Radix sort. The Idea of Radix sort The idea is to sort digit by digit starting from the least significant digit and moving to the most significant digit. here counting-sort is used as a subroutine to sort.

The Radix sort Algorithm

1. For all i where i is from the least significant to the most significant digit of the number do the following

than

O(nLogn)

for

the

range

1

to

n 2 ?

o sort the input array using counting sort according to its i'th digit.

Example

Original, unsorted list:

170, 45, 75, 90, 802, 24, 2, 66

Sorting by least significant digit (1s place) gives: [Notice that we keep 802 before 2, because 802 occurred before 2 in the original list, and similarly for pairs 170 & 90 and 45 & 75.]

170, 90, 802, 2, 24, 45, 75, 66

Sorting by next digit (10s place) gives: [Notice that 802 again comes before 2 as 802 comes before 2 in the previous list.] 802, 2, 24, 45, 66, 170, 75, 90

Sorting by most significant digit (100s place) gives: 2, 24, 45, 66, 75, 90, 170, 802

Example: Assume the input array is:

10,21,17,34,44,11,654,123

Based on the algorithm, we will sort the input array according to the one's digit (least significant digit). 0: 10 1: 21 11

2:

3: 123 4: 34 44 654

5:

6:

7: 17

8:

So, the array becomes 10,21,11,123,24,44,654,17 Now, we'll sort according to the ten's digit:

0:

1: 10 11 17 2: 21 123 3: 34 4: 44 5: 654

6:

7:

8:

9:

Now, the array becomes : 10,11,17,21,123,34,44,654 Finally , we sort according to the hundred's digit (most significant digit):

0: 010 011 017 021 034 044 1: 123

2:

3:

4:

5:

6: 654

7:

8:

9:

The array becomes : 10,11,17,21,34,44,123,654 which is sorted. This is how our algorithm works.

Implementation:

void countsort(int arr[],int n,int place)

{

 

int i,freq[range]={0};

//range for integers is 10 as digits range from 0-9

 

int output[n];

for(i=0;i<n;i++)

freq[(arr[i]/place)%range]++;

for(i=1;i<range;i++)

freq[i]+=freq[i-1];

for(i=n-1;i>=0;i--)

{

output[freq[(arr[i]/place)%range]-1]=arr[i];

freq[(arr[i]/place)%range]--;

}

for(i=0;i<n;i++)

arr[i]=output[i];

}

void radixsort(ll arr[],int n,int maxx)

//maxx is the maximum element in the array

{

   

int mul=1;

while(maxx)

{

countsort(arr,n,mul);

 

mul*=10;

maxx/=10;

}

}

What

is

the

running

time

of

Radix Sort?

Let there be d digits in input integers. Radix Sort takes O(d*(n+b)) time where b is the base for representing numbers, for example, for decimal system, b is 10. What is the value of d? If k is the maximum possible value, then d would be O(log b (k)). So overall time complexity is O((n+b) * log b (k)). Which looks more than the time complexity of comparison based sorting algorithms for a large k. Let

us first limit k. Let k <= n c where c is a constant. In that case, the complexity becomes O(nLog b (n)). But

comparison

What if we make value of b larger?. What should be the value of b to make the time complexity linear? If we set b as n, we get the time complexity as O(n). In other words, we can sort an array of integers with range from 1 to n c if the numbers are represented in base n (or every digit takes log 2 (n) bits). Is Radix Sort preferable to Comparison based sorting algorithms like Quick-Sort? If we have log 2 n bits for every digit, the running time of Radix appears to be better than Quick Sort for a wide range of input numbers. The constant factors hidden in asymptotic notation are higher for Radix Sort and Quick-Sort uses hardware caches more effectively. Also, Radix sort uses counting sort as a subroutine and counting sort takes extra space to sort numbers.

it

still

doesn’t

beat

based

sorting algorithms.

Algorithm

Worst Case

Average Case

Best Case

Radix Sort

O(n 2 )

d*s*n

O(n logn)

5.4 PRACTICAL CONSIDERATION FOR INTERNAL SORTING

Apart from radix sort, all the sorting methods require excessive data movement; i.e., as the result of a comparison, records may be physically moved. This tends to slow down the sorting process when records are large. In sorting files in which the records are large it is necessary to modify the sorting methods so as to minimize data movement. Methods such as Insertion Sort and Merge Sort can be easily modified to work with a linked file rather than a sequential file. In this case each record will require an additional link field. Instead of physically moving the record, its link field will be changed to reflect the change in position of that record in the file. At the end of the sorting process, the records are linked together in the required order. In many applications (e.g., when we just want to sort files and then output them record by record on some external media in the sorted order) this is sufficient. However, in some applications it is necessary to physically rearrange the records in place so that they are in the required order. Even in such cases considerable savings can be achieved by first performing a linked list sort and then physically rearranging the records according to the order specified in the list. This rearranging can be accomplished in linear time using some additional space.

If the file, F, has been sorted so that at the end of the sort P is a pointer to the first record in a

linked list of records then each record in this list will have a key which is greater than or equal to the key of the previous record (if there is a previous record). To physically rearrange these records into the order specified by the list, we begin by interchanging records R 1 and R P . Now, the record in the position R 1 has the smallest key. If P≠1 then there is some record in the list with link field = 1. If we could change this link field to indicate the new position of the record

n linked together in non

previously at position 1 then we would be left with records R 2 ,

decreasing order. Repeating the above process will, after n - 1 iterations, result in the desired rearrangement.

,R

n - 1 iterations, result in the desired rearrangement. ,R SEARCHTREES Binary Search Tree,AVL Tree,m-way search

SEARCHTREES

Binary Search Tree,AVL Tree,m-way search tree, B-tree, B+tree (Already Discussed)

HASHING

Suppose we want to design a system for storing employee records keyed using phone numbers. And we want following queries to be performed efficiently:

1. Insert a phone number and corresponding information.

2. Search a phone number and fetch the information.

3. Delete a phone number and related information.

We can think of using the following data structures to maintain information about different phone numbers.

1. Array of phone numbers and records.

2. Linked List of phone numbers and records.

3. Balanced binary search tree with phone numbers as keys.

4. Direct Access Table.

For arrays and linked lists, we need to search in a linear fashion, which can be costly in practice. If we use arrays and keep the data sorted, then a phone number can be searched in O(Logn) time using Binary Search, but insert and delete operations become costly as we have to maintain sorted order.

With balanced binary search tree, we get moderate search, insert and delete times. All of these operations can be guaranteed to be in O(Logn) time.

Another solution that one can think of is to use a direct access table where we make a big array and use phone numbers as index in the array. An entry in array is NIL if phone number is not present, else the array entry stores pointer to records corresponding to phone number. Time complexity wise this solution is the best among all, we can do all operations in O(1) time. For example to insert a phone

number, we create a record with details of given phone number, use phone number as index and store the pointer to the created record in table.

This solution has many practical limitations. First problem with this solution is extra space required

is huge. For example if phone number is n digits, we need O(m * 10 n ) space for table where m is

size of a pointer to record. Another problem is an integer in a programming language may not store

n digits.

Due to above limitations Direct Access Table cannot always be used. Hashing is the solution that can be used in almost all such situations and performs extremely well compared to above data structures like Array, Linked List, Balanced BST in practice. With hashing we get O(1) search time on average (under reasonable assumptions) and O(n) in worst case.

Hashing is an improvement over Direct Access Table. The idea is to use hash function that converts a given phone number or any other key to a smaller number and uses the small number as index in a table called hash table.

A function that converts a given big phone number to a small practical integer value. The mapped

integer value is used as an index in hash table. In simple terms, a hash function maps a big number or string to a small integer that can be used as index in hash table.

(Hash Function ‘H’ will be applicable on key K to generate a memory address L, i.e, H:K L)

A good hash function should have following properties

1) Efficiently computable. 2) Should uniformly distribute the keys (Each table position equally likely for each key) For example for phone numbers a bad hash function is to take first three digits. A better function is consider last three digits. Please note that this may not be the best hash function. There may be

better ways.

Different Hash functions are:

Division Method A key is mapped into one of m slots using the function h(k) = k mod m Requires only a single division, hence fast

Midsquare Method

Mid-Square hashing is a hashing technique in which unique keys are generated. In this technique, a seed value is taken and it is squared. Then, some digits from the middle are extracted. These extracted digits form a number which is taken as the new seed. This technique can generate keys with high randomness if a big enough seed value is taken. However, it has a limitation. As the seed is squared, if a 6-digit number is taken, then the square will have 12-digits. This exceeds the range of int data type. So, overflow must be taken care of. In case of overflow, use long long int data type or use string as multiplication if overflow still occurs. The chances of a collision in mid-square hashing are low, not obsolete. So, in the chances, if a collision occurs, it is handled using some hash map. Example:

Suppose

a

4-digit

seed

is

taken.

 

seed

=

4765

Hence,

square

of

seed

is

=

4765

*

4765

=

22705225

Now,

from

this

8-digit

number,

any

four

digits

are

extracted

(Say,

the

middle

four).

So,

the

new

seed

value

 

becomes

 

seed

=

7052

Now,

square

of

this

new

seed

is

=

7052

*

7052

=

49730704

Again,

the

same

set

of

4-digits

 

is

extracted.

So,

the

new

seed

value

 

becomes

 

seed

=

7307

.

.

.

.

This process is repeated as many times as a key is required.

Folding Method

The key is divided into several parts which are folded together There are two type

The key is divided into several parts which are folded together There are two type of folding: Shift

and boundary.

The key is divided into several parts which are folded together There are two type of

Shift folding Divide the number into parts and add(or other function) them together

123

456

789

1368 mod the Tsize.

Boundary folding, Alternate pieces are flipped on the boundary.

1.

123

321

2.

654

or 456

3.

789

987

4.

----

----

 

1566

1764

Hash Table: An array that stores pointers to records corresponding to a given phone number. An entry in hash table is NIL if no existing phone number has hash function value equal to the index for the entry.

Collision Handling: Since a hash function gets us a small number for a big key, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique. Following are the ways to handle collisions:

Chaining: The idea is to make each cell of hash table point to a linked list of records that have same

hash function value. Chaining is simple, but requires additional memory outside the table. Open Addressing: In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table.

What is Collision? Since a hash function gets us a small number for a key which is a big integer or string, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique.

H:K1L , H:K2L

What

Collisions are very likely even if we have big table to store keys. An important observation is Birthday Paradox. With only 23 persons, the probability that two people have same birthday is

table?

are

the

chances

of

collisions

with

large

How to handle Collisions? There are mainly two methods to handle collision:

1) Separate Chaining 2) Open Addressing

Separate Chaining:

The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.

7 ” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.  Advantages:

Advantages:

1) Simple to implement. 2) Hash table never fills up, we can always add more elements to chain 3) Less sensitive to the hash function or load factors. 4) It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.

Disadvantages:

1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing provides better cache performance as everything is stored in same table. 2) Wastage of Space (Some Parts of hash table are never used) 3) If the chain becomes long, then search time can become O(n) in worst case. 4) Uses extra space for links.

Performance of Chaining:

Performance of hashing can be evaluated under the assumption that each key is equally likely to be hashed to any slot of table (simple uniform hashing). m = Number of slots in hash table n = Number of keys to be inserted in hash table Load factor α = n/m

Expected time to search = O(1 + α) Expected time to insert/delete = O(1 + α) Time complexity of search insert and delete is O(1) if α is O(1) Open Addressing Like separate chaining, open addressing is a method for handling collisions. In Open Addressing, all elements are stored in the hash table itself. So at any point, size of the table must be greater than or equal to the total number of keys (Note that we can increase table size by copying old data if needed).

Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k. Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached. Delete(k): Delete operation is interesting. If we simply delete a key, then search may fail. So slots of deleted keys are marked specially as “deleted”. Insert can insert an item in a deleted slot, but the search doesn’t stop at a deleted slot.

Open Addressing is done following ways:

a) Linear Probing: In linear probing, we linearly probe for next slot. For example, typical gap

between two

example also.

let hash(x) be the slot index computed using hash function and S be the table size

probes

is

1

as

taken

in

below

If slot hash(x) % S is full, then we try (hash(x) + 1) % S If (hash(x) + 1) % S is also full, then we try (hash(x) + 2) % S If (hash(x) + 2) % S is also full, then we try (hash(x) + 3) % S

Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.

7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101. Clustering: The main

Clustering: The main problem with linear probing is clustering, many consecutive elements form groups and it starts taking time to find a free slot or to search an element.

b) Quadratic Probing We look for i 2 ‘th slot in i’th iteration.

let hash(x) be the slot index computed using hash function. If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S

c) Double Hashing We use another hash function hash2(x) and look for i*hash2(x) slot in i’th rotation.

let hash(x) be the slot index computed using hash function. If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S

Comparison of above three:

Linear probing has the best cache performance but suffers from clustering. One more advantage of Linear probing is easy to compute. Quadratic probing lies between the two in terms of cache performance and clustering. Double hashing has poor cache performance but no clustering. Double hashing requires more computation time as two hash functions need to be computed.

S.No.

Separate Chaining

Open Addressing

1.

Chaining is Simpler to implement.

Open Addressing requires more computation.

 

In chaining, Hash table never fills up, we can always add more

In open addressing, table may become full.

2.

elements to chain.

 

Chaining is Less sensitive to the

Open addressing requires extra care for to avoid clustering and load factor.

3.

hash function or load factors.

 

Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or

Open addressing is used when the frequency and number of keys is known.

4.

deleted.

 

Cache performance of chaining is not good as keys are stored using

Open addressing provides better cache performance as everything is stored in the same table.

5.

linked list.

 

Wastage of Space (Some Parts of hash table in chaining are never

In Open addressing, a slot can be used even if an input doesn’t map to it.

6.

used).

 

Chaining uses extra space for

 

7.

links.

No links in Open addressing

Performance of Open Addressing:

Like Chaining, the performance of hashing can be evaluated under the assumption that each key is equally likely to be hashed to any slot of the table (simple uniform hashing)

m = Number of slots in the hash table

n = Number of keys to be inserted in the hash table

Load factor α = n/m

Expected time to search/insert/delete < 1/(1 - α)

( < 1 )

So Search, Insert and Delete take (1/(1 - α)) time

5.6

STORAGEMANAGEMENT

5.6.1Garbage Collection

Garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim garbage, or memory occupied by objects that are no longer in use by the program.

Garbage collection is the opposite of manual memory management, which requires the programmer to specify which objects to deallocate and return to the memory system. Like other memory management techniques, garbage collection may take a significant proportion of total processing time in a program and can thus have significant influence on performance.

Resources other than memory, such as network sockets, database handles, user interaction windows, and file and device descriptors, are not typically handled by garbage collection. Methods used to manage such resources, particularly destructors, may suffice to manage memory as well, leaving no need for GC. Some GC systems allow such other resources to be associated with a region of memory that, when collected, causes the other resource to be reclaimed; this is called finalization. The basic principles of garbage collection are:

Find data objects in a program that cannot be accessed in thefuture.

Reclaim the resources used by thoseobjects.

Many programming languages require garbage collection, either as part of the language specification or effectively for practical implementation these are said to be garbage collected languages. Other languages were designed for use with manual memory management, but have garbage collected implementations available (for example, C, C++). While integrating garbage collection into the language's compiler and runtime system enables a much wider choice of methods. The garbage collector will almost always be closely integrated with the memory allocator.

Advantages

Garbage collection frees the programmer from manually dealing with memory deallocation. As a result, certain categories of bugs are eliminated or substantially reduced:

Dangling pointer bugs, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is dereferenced. By then the memory may have been reassigned to another use, with unpredictableresults.

Double free bugs, which occur when the program tries to free a region of memory that has already been freed, and perhaps already been allocatedagain.

Certain kinds of memory leaks, in which a program fails to free memory occupied by objects that have become unreachable, which can lead to memoryexhaustion.

Efficient implementations of persistent datastructures

Disadvantages

Typically, garbage collection has certain disadvantages:

Garbage collection consumes computing resources in deciding which memory to free, even though the programmer may have already known this information. The penalty for the convenience of not annotating object lifetime manually in the source code is overhead, which can lead to decreased or uneven performance. Interaction with memory hierarchy effects can make this overhead intolerable in circumstances that are hard to predict or to detect in routinetesting.

The moment when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments, in transaction processing, or in interactiveprograms.

5.6.2 Compaction

The process of moving all marked nodes to one end of memory and all available memory to other end is called compaction. Algorithm which performs compaction is called compacting algorithm. After repeated allocation and de allocation of blocks, the memory becomes fragmented. Compaction is a technique that joins the non contiguous free memory blocks to form one large block so that the total free memory becomes contiguous.

All the memory blocks that are in use are moved towards the beginning of the memory i.e. these blocks are copied into sequential locations in the lower portion of the memory. When compaction is performed, all the user programs come to a halt. A problem can arise if any of the used blocks that are copied contain a pointer value. Eg. Suppose inside block P5, the location 350 contains address 310. After compaction the block P5 is moved from location 290 to location 120, so now the pointer value 310 stored inside P5 should change to 140. So after compaction the pointer values inside blocks should be identified and changed accordingly.