2 Basic Analysis, Searching

QEEE DSA05
DATA STRUCTURES AND

ALGORITHMS
G VENKATESH AND MADHAVAN MUKUND
LECTURE 2, 5 AUGUST 2014
Analysis of algorithms
Measuring eciency of an algorithm
Time: How long the algorithm takes (running

time)
Space: Memory requirement
Example 1: Sorting
Sorting an array with n elements
Nave algorithms : time proportional to n2
Best algorithms : time proportional to n log n
How important is this distinction?
Typical CPUs process up to 1010 operations per

second
Probably an overestimate, but useful for

approximate calculations
Example 1: Sorting
Telephone directory for mobile phone users in India
India has about 1 billion = 109 phones
Nave n2 algorithm requires 1018 operations
1010 operations per second 108 seconds
27780 hours
1157 days
3 years!
Smart n log n algorithm takes only about 3 x 1010

operations
About 3 seconds
Example 2: Video game

Several objects on screen
Basic step: find closest pair of objects
Given n objects, nave algorithm is again n2
For each pair of objects, compute their distance
Report minimum distance over all such pairs
There is a clever algorithm that takes time n log n
Example 2: Video game

High resolution monitor has 2500 x 1500 pixels
3.75 million points
Suppose we have 500,000 = 5 x 105 objects
Nave algorithm takes 25 x 1010 steps = 25 seconds
25 second response time is unacceptable!
Smart n log n algorithm takes a thousandth of a

second
Time and space

Time depends on processing speed
Impossible to change for given hardware
Space is a function of available memory
Easier to reconfigure, augment
Traditionally, algorithm analysis concentrates on

time, not space
Input size
Running time depends on input size
Larger arrays will take longer to sort
Measure time eciency as function of input size
Input size n
Running time t(n)
Dierent inputs of size n may each take a dierent

amount of time
Typically t(n) is worst case estimate
Input size
How do we fix input size?
Typically a natural parameter
For sorting and other problems on arrays:

array size
For combinatorial problems: number of objects
For graphs, two parameters: number of vertices

and number of edges
Measuring running time

Analysis independent of underlying hardware
Dont use actual time
Measure in terms of basic operations
Typical basic operations
Compare two values
Assign a value to a variable
Other operations may be basic, depending on context
Exchange values of a pair of variables
Orders of magnitude
When comparing t(n) across problems, focus on
orders of magnitude
Ignore constants
f(n) = n3 eventually grows faster than g(n) = 5000 n2
For small values of n, f(n) is smaller than g(n)
At n = 5000, f(n) overtakes g(n)
What happens in the limit, as n increases :

asymptotic complexity
Choice of basic operations

Flexibility in identifying basic operations
Swapping two variables involves three assignments
tmp x
x y
y tmp
Number of swaps is 3 times number of assignments
If we ignore constants, t(n) is of the same order of

magnitude even if swapping values is treated as a
basic operation
Typical functions
We are interested in orders of magnitude
Is t(n) proportional to log n, , n2 , n3 , , 2n?
Logarithmic, polynomial, exponential
Typical functions t(n)
Feasibility limit
Even n2 is infeasible for
inputs of size 1 million
(10 lakhs)
Worst case complexity

Running time on input of size n varies across
inputs
Search for K in an unsorted array A
i 0

while i < n and A[i] != K do

i i+1
if i < n return i
else return -1
Worst case complexity

For each n, worst case input forces algorithm to
take the maximum amount of time
If K not in A, search scans all elements
Upper bound for the overall running time
Here worst case is O(n) for array size n
Can construct worst case inputs by examining the

algorithm
Average case complexity

Worst case may be very rare: pessimistic
Compute average time taken over all inputs
Dicult to compute
Average over what?
Are all inputs equally likely?
Need probability distribution over inputs
Comparing time efficiency

We measure time eciency only upto an order of
magnitude
Ignore constants
How do we compare functions with respect to

orders of magnitude?
Upper bounds, big O

t(n) is said to be O(g(n)) if we can find suitable
constants c and n0 so that cg(n) is an upper bound
for t(n) for n
beyond n0
t(n) cg(n)
for every n n0
Examples: Big O
100n + 5 is O(n2)
100n + 5
100n + n, for n 5
2
= 101n 101n , so n0 = 5, c = 101
Alternatively
100n + 5
100n + 5n, for n 1
= 105n 105n2, so n0 = 1, c = 105
n0 and c are not unique!
Of course, by the same argument, 100n+5 is also O(n)
Examples: Big O
100n2 + 20n + 5 is O(n2)
100n2 + 20n + 5
100n2 + 20n2 + 5n2, for n 1
125n2
n0 = 1, c = 125
What matters is the highest term
20n + 5 dominated by 100n2
Examples: Big O
n3 is not O(n2)
No matter what c we choose, cn2 will be

dominated by n3 for n c
Useful properties
If
f1(n) is O(g1(n))
f2(n) is O(g2(n))
then f1(n) + f2(n) is O(max(g1(n),g2(n)))
Why is this important?

Algorithm has two phases
Phase A takes time O(gA(n))
Phase B takes time O(gB(n))
Algorithm as a whole takes time
max(O(gA(n)),O(gB(n)))
For an algorithm with many phases, least ecient

phase is an upper bound for the whole algorithm
Binary search
Searching for K in unsorted list A takes time O(n)
What if A is sorted?
Compare K with midpoint of A
If midpoint is K, the value is found
If K < midpoint, search left half of A
If K > midpoint, search right half of A
Binary Search
How long does this take?
Each step halves the interval to search
Initially, interval is size n
For interval of size 0 or 1, answer is immediate
After j steps, interval is size n/2j
After j = log2 n steps, we are done, so O(log n)
Worst case is when K is not found in A
Binary search
bsearch(K,A,left,right)
// A sorted, search for K from A[left] to A[right-1]

if (right - left == 0) return(false)
if (right - left == 1) return(K == A[left])
mid = (left + right) div 2
// integer division
if (K == A[mid]) return (true)

if (K < A[mid]) return (bsearch(K,A,left,mid)
else return(bsearch(K,A,mid+1,right))
Summary
Measure worst case time complexity
Asymptotict(n) as n becomes large
Only orders of magnitude are important
O( ) notation compares orders of magnitude
Complexity of algorithm limits range of operation
Search in an unsorted list is O(n)
Binary search in a sorted list is O(log n)

2 Basic Analysis, Searching

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

2 Basic Analysis, Searching

Caricato da

Copyright:

Formati disponibili

QEEE DSA05

DATA STRUCTURES AND

Time: How long the algorithm takes (running

Space: Memory requirement

Nave algorithms : time proportional to n2

Best algorithms : time proportional to n log n

How important is this distinction?

Typical CPUs process up to 1010 operations per

Probably an overestimate, but useful for

India has about 1 billion = 109 phones

Nave n2 algorithm requires 1018 operations

1010 operations per second 108 seconds

Smart n log n algorithm takes only about 3 x 1010

Example 2: Video game

Basic step: find closest pair of objects

Given n objects, nave algorithm is again n2

For each pair of objects, compute their distance

Report minimum distance over all such pairs

There is a clever algorithm that takes time n log n

Example 2: Video game

3.75 million points

Suppose we have 500,000 = 5 x 105 objects

Nave algorithm takes 25 x 1010 steps = 25 seconds

25 second response time is unacceptable!

Smart n log n algorithm takes a thousandth of a

Time and space

Impossible to change for given hardware

Space is a function of available memory

Easier to reconfigure, augment

Traditionally, algorithm analysis concentrates on

Larger arrays will take longer to sort

Measure time eciency as function of input size

Running time t(n)

Dierent inputs of size n may each take a dierent

Typically t(n) is worst case estimate

Typically a natural parameter

For sorting and other problems on arrays:

For combinatorial problems: number of objects

For graphs, two parameters: number of vertices

Measuring running time

Dont use actual time

Measure in terms of basic operations

Typical basic operations

Compare two values

Assign a value to a variable

Other operations may be basic, depending on context

Exchange values of a pair of variables

f(n) = n3 eventually grows faster than g(n) = 5000 n2

For small values of n, f(n) is smaller than g(n)

At n = 5000, f(n) overtakes g(n)

What happens in the limit, as n increases :

Choice of basic operations

Number of swaps is 3 times number of assignments

If we ignore constants, t(n) is of the same order of

We are interested in orders of magnitude

Is t(n) proportional to log n, , n2 , n3 , , 2n?

Logarithmic, polynomial, exponential

Typical functions t(n)

Worst case complexity

Search for K in an unsorted array A

while i < n and A[i] != K do

Worst case complexity

If K not in A, search scans all elements