CS200: Algorithms Analysis

CS200: Algorithms Analysis
Mathematical Foundations for this class are found in appendices - begin reviewing Appendix A.

Introduction to Algorithms - CH1
What is an Algorithm? A well-defined general computational process that takes a set of values as input and produces a set of values as output, {process is finite, output is correct}. A function that maps an input instance to a correct output instance and halts, f(a) = b.

What is algorithm analysis?
Application of mathematical techniques to determine the relative efficiency of an algorithm Why analyze algorithms? Programmer maturity Select the best algorithm for the job Identify intractable algorithms (NP-complete) Computers are not infinitely fast nor is memory unlimited

Example: Two Fibonacci algorithms, which is more efficient and why
Example: Two Fibonacci algorithms, which is more efficient and why? How to measure efficiency? What efficiency metric should be used? How is the metric quantified? Recursive algorithm is elegant but time efficiency is exponential in n, space efficiency is linear in n - repeated sub-problems (more later) Loop algorithm has a linear time efficiency in n and uses a constant amount of space - a simple dynamic programming algorithm (more later) Recursion is still a powerful algorithmic tool //pre: n > 0 //post : fib(n) = nth fibonacci number int fib( int n) { if(n <= 2) return 1; return fib(n-1) + fib(n-2); } int fib(int n) { int f, f1, f2; f = f1 = f2 = 1; for(int i = 3; i <= n; i++) { f = f1 + f2; f2 = f1; f1 = f; return f;}

Fib1 - 2(2n) and runs on Machine A (109 instr/sec)
Should hardware and software differences be considered when analyzing algorithm efficiency? i.e. How important are factors such as clock rate, programming language, OS, compiler, etc? Fib1 - 2(2n) and runs on Machine A (109 instr/sec) Fib n and runs on Machine B (104 instr/sec) If n = 30 then Fib1 runs in 2.15 sec., and Fib2 runs in 3 sec. But if n = 100 then Fib1 runs in × 1012 years while Fib2 runs in 10 sec.WE ARE INTERESTED IN LARGE N.

Does the choice of a data structure impact algorithm efficiency
Does the choice of a data structure impact algorithm efficiency? Give an example. Find the median of a sorted sequence if the sequence is stored in an array versus stored in a linked list - impacts time efficiency no difference in space efficiency. Search for a key stored in a sorted array versus a Hash Map - impacts time efficiency no difference in space efficiency. etc.

The Basics - CH2 Goals: Start using frameworks for describing and analyzing algorithms. Examine two algorithms for sorting: insertion sort and merge sort. See how to describe algorithms in pseudo code. Begin using asymptotic notation to express running- time analysis. Learn the technique of “divide and conquer” in the context of merge sort.

Example: General Sort Algorithm
Input : sequence of values A = <a1, a2, ... , an> Output : a permutation of A, A' = <a1', a2', ... , an'> such that a1' <= a2' <= ...<= an' The sequences are typically stored in arrays. We also refer to the numbers as keys. Along with each key may be additional information, known as satellite data. We will see several ways to solve the sorting problem.

Insertion Sort Pseudo Code Example - on board InsertionSort(A)
1. for j = 2 to n do key = A[j] i = j - 1 while(i>0)and(A[i]>key) do A[i+1] = A[i] i = i -1 A[i + 1] = key A good algorithm for sorting a small number of elements. It works the way you might sort a hand of playing cards. Data structures are represented in upper case and passed by reference. The size of a data structure is n. Scalars are lower case and passed by value. Local variables are implicitly declared. Indentation indicates block structure. Loop control variable is defined outside the loop. Authors use <- for assignment . Arrays are indexed from 1 … n. Use … for a range of values in a data structure. And, or are short circuiting. Pseudo code is similar to C, C++, Pascal, and Java.. Pseudo code is designed for expressing algorithms to humans. Software engineering issues of data abstraction, modularity, and error handling are often ignored. We sometimes embed English statements into pseudo code.

Algorithm Execution Description.
Instance of Insertion Sort, A = <5, 2, 4, 6, 1, 3>, traced. Animation It works the way you might sort a hand of playing cards: • Start with an empty left hand and the cards face down on the table. • Then remove one card at a time from the table, and insert it into the correct position in the left hand. • To find the correct position for a card, compare it with each of the cards already in the hand, from right to left. • At all times, the cards held in the left hand are sorted, and these cards were originally the top cards of the pile on the table. Each part shows what happens for a particular iteration with the value of j indicated. j indexes the “current card” being inserted into the hand. Elements to the left of A[ j ] that are greater than A[ j ] move one position to the right, and A[ j ] moves into the evacuated position. The heavy vertical lines separate the part of the array in which an iteration works—A[1 . . j ]—from the part of the array that is unaffected by this iteration—A[ j n] . The last part of the figure shows the final sorted array.]

Analyzing Algorithms 1 We want to predict the resources that the algorithm requires. Usually, running time. In order to predict resource requirements, we need a computational model. Random-access machine (RAM) model Instructions are executed one after another. No concurrent operations. It’s too tedious to define each of the instructions and their associated time costs. Instead, we recognize that we will use instructions commonly found in real computers:

Analyzing Algorithms 2 Arithmetic: add, subtract, multiply, divide, remainder, floor, ceiling. Data movement: load, store, copy. Control: conditional/unconditional branch, subroutine call and return. Each of these instructions takes a constant amount of time.

Run-Time Analysis of Algorithms
(predicting the time resource requirements of an algorithm). This requires determining two quantitative measures: 1. A count of number of primitive operations: view taken, each line of pseudo-code is a primitive operation and takes a constant amount of time. 2. Input instance Input size (6 elements vs elements) Input structure (partially sorted vs. reverse order)

In analysis we are most interested in the UPPER-BOUND on run-time -> maximum number of primitive operations that are executed on an input of size n. Types of analysis: Worst-Case : T(n) = maximum run-time on any input of size n. Average-Case : T(n) = average run-time over all inputs of size n.

Average: This type of analysis assumes a statistical distribution of inputs. i.e. For insertion sort, this would require determining the average run-time for all possible permutations of A. Typically, average-case behavior degrades to worst-case behavior. Best-Case : T(n) = best run-time on any input of size n. Best: This type of analysis is cheating as a slow algorithm appears fast on a special case of its input. Useful to show that a bad lower-bound on run-time has been determined for an algorithm.

What is Worst-Case run-time of Insertion Sort?
Depends on the speed of the primitive operations in the algorithm. relative speed (on same machines) absolute speed (on different machines)

ASYMPTOTIC ANALYSIS Ignore machine dependent run-time constants.
Look at growth of T(n) as n –> infinity Use asymptotic notation drop low order terms. ignore leading constants

Intuition Behind Asymptotic Notation
Insertion Sort Analysis Cost Times 1. for j = 2 to n do c1 n 2. key = A[j] c2 n-1 4. i = j c4 n-1 5. while(i>0) and (A[i]>key) do c5  (tj) 6. A[i+1] = A[i] c6 (tj-1) 7. i = i -1 c7 (tj-1) 8. A[i+1] = key c8 n-1 What are the bounds on  ?

Collecting Terms (proof)
T(n)=c1n+c2(n-1)+c4(n-1)+c5( tj)+c6([ tj-1])+ c7( [tj-1])+c8(n-1) bound on each summation is j=2 … n Worst-case occurs when array is in reverse sorted order: tj = j for j = 2, 3, ... , n because each A[j] must be compared to each element in the sorted sub-array. Simplify T(n) by finding closed form for summations and gathering terms. T(n) = an2 +bn + c = Q(n2) Worst Case

Average-case run time for insertion sort occurs when all permutations of elements are equally likely: tj = j/2 because on average half of the elements in A[1..j-1] are < A[j] and half are > A[j]. Simplify T(n) by finding closed form for summations and gathering terms. T(n) = an2 +bn + c = Q(n2) Average Case

Best-case run time occurs when the array is already sorted: tj = 1.
Simplify T(n) by finding closed form for summations and gathering terms. T (n) = c1n + c2(n - 1) + c4(n - 1) + c5(n - 1) + c8(n - 1) = (c1 + c2 + c4 + c5 + c8)n - (c2 + c4 + c5 + c8) . T(n) = an + b= Q(n) Best Case Is this a fast sorting algorithm?

Summary What is an algorithm? Why do analysis?
Why ignore system dependent issues? Types of analysis? Know closed form for simple summations! Review appendix A

CS200: Algorithms Analysis

Similar presentations

Presentation on theme: "CS200: Algorithms Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS200: Algorithms Analysis

Similar presentations

Presentation on theme: "CS200: Algorithms Analysis"— Presentation transcript:

Similar presentations

About project

Feedback