Algorithms & Cost.

Algorithms & Cost

Programming for BIG Data
9.2 Algorithms & Cost Algorithms Algorithms are designed to solve problems A given problem may have many possible solutions How can we decide which solution is the most efficient for a given problem? One approach is to measure the execution time The amount of data to be processed will have an impact on the execution time A finite sequence of instructions that specifies exactly how an operation will be performed. 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Algorithms What other factors can have an impact? Type of hardware? Whether other processes are running on a machine? Choice of programming language? Compiled / Interpreted language 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost Experimental Study Approach: Implement the algorithms in a programming language Run the programs Measure their performance (e.g. time running) Compare the results Issues: How are the algorithms coded? What computer/hardware used? What data should the program use? Did the external factors occur while testing e.g., background processes? 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost Experimental Study Limitations: Must implement and test the algorithm in order to determine its running time Experiments can be done only on a limited set of inputs In order to compare two algorithms, the same hardware and software environments should be used May not assume running times on one set of inputs is indicative of all inputs 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost Complexity Analysis Examine the algorithm and determine which instructions which most critically imapct on the execution time We could count Number of logical comparisons Number of data interchanges Number of arithmetic operations 15/02/2019 Programming for BIG Data

Computational Complexity
Compares growth of two functions Independent of constant multipliers and lower-order effects Metrics “Big O” Notation O() “Big Omega” Notation () “Big Theta” Notation () Computational Complexity The notion of computational complexity was introduced in the prerequisite math courses, so you should be somewhat familiar with it. We will re-introduce the ideas in this chapter. A more thorough and rigorous treatment of com[utational complexity is a central theme of COP 4531, the course that follows COP 4530 in our curriculum. Computational complexity provides a language for comparing the growth of functions that are defined on (all but finitely many) non-negative integers and which have non-negative real number values. The important class of examples that we will use are functions whose values are run time or run space of an algorithm, and whose input is a number representing the size of the data set on which the algorithm operates. Complexity of a function is dependent only on its eventual behavior, that is, complexity is independent of any finite number of initial values. The practical effect of this is to ignore initialization phases of algorithms. Complexity of a function is independent of any constant multiplier. The practical effect of this is to ignore differences in such things as processor speed when comparing performance of two algorithms. The properties just stated come directly from the definition of computational complexity. A more subtle property, deducible from the definition, is that complexity is independent of so-called "lower order" effects. We will not attempt to make this last statement precise, but we will give some examples to illustrate the concept. The key measures of computational complexity are known as "Big O" notation, "Big Omega" notation, and "Big Theta" notation. You should be somewhat familiar with at least some of these. The definitions appear in the next slide.

Big “O” Notation f(n) =O(g(n)) If and only if
there exist two constants c > 0 and n0 > 0, such that f(n)  cg(n) for all n  n0 iff  c, n0 > 0 s.t.  n  n0 : 0  f(n)  cg(n) cg(n) f(n) Asymptotic Relations and Notation The Big O class of a function f consists of all functions that are "asymptotically bounded above by f". The official definition from [Cormen] follows: g(n) is in O(f(n)) if and only if there exist positive constants c and n0 such that 0 <= g(n) <= cf(n) for all n >= n0 Notice that f(n) is in O(f(n)), as are 100f(n) and f(n). Notice also that the condition defining O(f(n)) is an upper bound condition. Big Omega is defined using the corresponding lower bound condition: g(n) is in Ω(f(n)) if and only if there exist positive constants c and n0 such that 0 <= cf(n) <= g(n) for all n >= n0 and Big Theta is defined using both at once: g(n) is in Θ(f(n)) if and only if there exist positive constants c1, c2, and n0 such that 0 <= c1f(n) <= g(n) <= c2f(n) for all n >= n0 The following facts about computational complexity are not extremely difficult to prove, nevertheless, the proofs are left to COP 4531. f(n) is eventually upper- bounded by g(n) n0

Big “Omega” Notation f(n) = (g(n))
iff  c, n0 > 0 s.t.  n ≥ n0 , 0 ≤ cg(n) ≤ f(n) f(n) cg(n) f(n) is eventually lower-bounded by g(n) n0

Big “Theta” Notation f(n) = (g(n))
iff  c1, c2, n0 > 0 s.t. 0 ≤ c1g(n) ≤ f(n) ≤ c2g(n),  n >= n0 f(n) c1g(n) n0 c2g(n) f(n) has the same long-term rate of growth as g(n)

Examples 3n2 + 17 (1), (n), (n2)  lower bounds
O(n2), O(n3), ...  upper bounds (n2)  exact bound

9.2 Algorithms & Cost Measuring Cost Big-Oh Notation Rather than count the number of operations precisely we can look at the order of magnitude This way we get an approximation of the time or resources required to solve a problem Big-Oh notation is used to express these approximations when we specify an algorithm’s classification as being ‘on the order of ...’ 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost Big-Oh Notation Assume we have some function T(n) = n^2 + n giving an approximation of the number of steps Also assume there is some function f(n), such that for some constant c, and some constant m, T(n) <= c f(n) for all sufficiently large values of n >= m. We say in this case that the algorithm has a time-complexity of f(n) relative to the number of operations it requires 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost Big-Oh Notation The function f(n) indicates the rate of growth at which the run time of an algorithm increases as the input size n increases. To specify the time-complexity of an algorithm wgich runs on the order of f(n) we write O( f(n) ) In the matrix sum examples both algorithms are O(n^2) 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost Constructing T(n) We assume that each basic operation or statement takes the same amount of time  called constant time. The total number of operations can be computed as a sum of the time required to perform each step The steps requiring constant time are generally omitted since they eventually become part of the constant of proportionality. 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost For a basic operation the time taken does not depend on the specific values of the data that is used or manipulated by the instruction. Measuring Cost Constructing T(n) 1 1 1 n 1 n 1 1 The basic operations are marked with a constant time The loops are marked with the number of iterations 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost To simplify the running time estimation, for a function f(n), we will ignore the constants and lower order terms When we have polynomial that describes the time requirements for an algorithm, we simplify it by: Throwing out all but the highest=order term Throwing out all the constants E.g., if an algorithm takes C*n^2+D*n+E time, we simplify this formula to just n^2 We say the algorithm requires O(n^2) in terms of Big-O Calculate the Big‐O Notation for the following: 1. f(n) = 7*n ‐ 2 2. f(n) = 3*n^3 + 20*n^2 + 5 3. f(n) = 3*log n + 10 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost How to determine the running time of a piece of code? Answer: it depend on what kind of statements are used Sequence statements Total Running Time = adding the times for all statements Total Time = time(statement 1) + time(statement 2) time(statement k) If each statement is "simple" (only involves basic operations) => the time for each statement is constant => O(1) Total time is also constant => O(1). statement 1 statement 2 …. statement k 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost How to determine the running time of a piece of code? Selection statement The worst‐case time is the slowest of the two possibilities Total Running Time => max (time(sequence 1), time(sequence 2)). for example, if sequence 1 is O(n) and sequence 2 is O(1) the worst‐case time for the whole if‐else statement would be O(n). if cond: sequence of statements 1 else: sequence of statements 2 Either sequence of statement 1 will execute, or sequence of statements 2 will execute 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost How to determine the running time of a piece of code? Repetition statement Total running time = n*time(sequence of statements) For example, we assume the statements are O(1) -> total time for the loop id n*O(1), which is O(n) overall Statements with functions calls When a statement involves a function call, the complexity of the statement includes the complexity of the function call. for i in range(n): sequence of statements The loop executes n times. So, the sequence of statements also executes N times f(k) # O(1) g(k) #O(n) 15/02/2019 Programming for BIG Data

Typical Growth Rates Function Name f(x) = c, c  R Constant log(N)
Logarithmic log2(N) Log-squared N Linear N log(N) N2 Quadratic N3 Cubic 2N Exponential

Some Rules of Thumb If f(n) is a polynomial of degree k
Then f(n) = (Nk) logkN = O(N), for any k Logarithms grow very slowly compared to even linear growth

Maximum Subsequence Problem
Given a sequence of integers A1, A2, …, AN Find the maximum subsequence (Ai + Ai+1 + … + Ak), where 1 ≤ i ≤ N Many algorithms of differing complexity can be found Algorithm time Input Size 1 O(N3) 2 O(N2) 3 O(N*logN) 4 O(N) N=10 N=100 N=1,000 N=10,000 N.A. 1.2329 N=100,000 135

Maximum Subsequence Problem : How Complexity affects running times

Complexity Analysis Estimate n = size of input
Isolate each atomic activities to be counted Find f(n) = the number of atomic activities done by an input size of n Complexity of an algorithm = complexity of f(n) Algorithm Complexity To apply the notation and theory of computational complexity to algorithms is a four step process. First discover a measure of size of input to the algorithm. Second, decide on a notion of atomic computational activity that captures the work of the algorithm. Third, find the function f(n) = the number of atomic computations performed on input of size n. Finally, the complexity of the algorithm is the complexity of f(n). We illustrate this process in several examples as we conclude this chapter, as well as in various places throughout the remainder of the course.

Running Time Calculations - Loops
for (j = 0; j < n; ++j) { // 3 atomics } Complexity = (3n) = (n) Algorithm Complexity - Loops Despite the facetious remarks we made earlier about how "obvious" it is that a simple fixed-bound loop terminates, it actually is true that simple loops are straightforward to analyze. Usually it is clear when and why they terminate and what computational work is accomplished in each iteration of the loop body. If, for example we define an atomic computation as a call to a comparison operator, there might be three such calls in the loop body. That situation is depicted in the slide. The complexity of the loop is defined to be the complexity of the function f(n) = (no. of atomics in loop body) x (no. of iterations of loop) = (3) x (n) = 3n Thus, the complexity of this loop is equal to Θ(f(n)) = Θ(3n) = Θ(n) The situation is often not quite this simple, however.

Loops with Break for (j = 0; j < n; ++j) { // 3 atomics
if (condition) break; } Upper bound = O(4n) = O(n) Lower bound = (4) = (1) Complexity = O(n) Why don’t we have a (…) notation here? Algorithm Complexity - Loops with Break The case of a loop with a conditional breakout is shown in this slide. In the cases where the loop runs to normal termination, the run time of the loop is correctly modelled by the same function as for the simple loop above. But in other cases, the loop may terminate sooner. These cases are data dependent; that is, the runtime of the loop varies from an upper bound of 3n = O(n) to a lower bound of 3 = Ω(1), depending on the specific input to the loop. We cannot conclude that the algorithm has complexity Θ(n) because the lower bound condition >= Ω(n) does not hold. Therefore, the best we can conclude is that the loop has complexity <= O(n).

Loops in Sequence for (j = 0; j < n; ++j) { // 3 atomics }
Complexity = (3n + 5n) = (n) Algorithm Complexity - Loops in Sequence This slide shows two loops one following the other in the source code. These are sometimes referred to as concatenated loops. Concatenated program blocks execute in sequence, one after another. Therefore the runtime of two concatenated blocks is the sum of the runtimes of the individual blocks. For the situation depicted on the slide, the runtime is bounded above by O(3n + 5n) <= O(n).

Nested Loops for (j = 0; j < n; ++j) { // 2 atomics
for (k = 0; k < n; ++k) { // 3 atomics } Complexity = ((2 + 3n)n) = (n2) Algorithm Complexity - Loops Nested This slide shows two loops one inside the other in the source code. These are sometimes referred to as composed loops. The runtime of two composed blocks is the product of the runtimes of the individual blocks. Thus, the runtime of the composed loops depicted in the slide is bounded above by O((2 + 3n)n) <= O(2n + 3n2)) <= O(n2).

Consecutive Statements
Complexity = O(2n) + O((2+3n)n) = O(n) + O(n2) = ?? = O(n2) for (i = 0; i < n; ++i) { // 1 atomic if(condition) break; } for (j = 0; j < n; ++j) { for (k = 0; k < n; ++k) { // 3 atomics

9.2 Algorithms & Cost Measuring Cost Big-O Summary Ignore low-order terms E.g., O(n^3+4*n^2+3*n)=O(n^3) Ignore multiplicative constant E.g., O(5*n^3)=O(n^3) Combine growth-rate function O(f(n))+O(g(n))=O(f(n)+g(n)) r 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost Example: Compute the sum of each row of an n x n matrix and an overall sum of the entire matrix How many additions are performed? 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost How many additions are performed? There are 2 addition operations specified in the code There are 2 loops, one nested in the other The inner loop is executed n times So there are 2n addition operations associated with the inner loop The outer loop is also executed n times So there is a total of n x 2n = 2n^2 additions 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost Example: Compute the sum of each row of an n x n matrix and an overall sum of the entire matrix How many additions are performed? 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost How many additions are performed? There are 2 loops, one nested in the other There are 2 addition operations specified in the code This time there is one addition operation associated with each loop The inner loop is executed n times So there are n addition operations associated with the inner loop For each iteration of the outer loop there will be n + 1 additions The outer loop is executed n times So there are n x (n+1) = n^2 + n additions in total 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost Which has less additions?
Is this significant? As n gets larger the difference in execution times will NOT be significant! 15/02/2019 Programming for BIG Data

9.2 Algorithms & Cost Measuring Cost 15/02/2019 Programming for BIG Data

9.3 Summary Algorithms & Cost 15/02/2019 Programming for BIG Data

Algorithms & Cost.

Similar presentations

Presentation on theme: "Algorithms & Cost."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Algorithms & Cost.

Similar presentations

Presentation on theme: "Algorithms & Cost."— Presentation transcript:

Similar presentations

About project

Feedback