1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband.

Slides:



Advertisements
Similar presentations
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Advertisements

Potential for parallel computers/parallel programming
Razdan with contribution from others 1 Algorithm Analysis What is the Big ‘O Bout? Anshuman Razdan Div of Computing.
Asymptotic Notation (O, Ω,  ) s Describes the behavior of the time or space complexity for large instance characteristics s Common asymptotic functions.
1 Lecture 5: Part 1 Performance Laws: Speedup and Scalability.
Chapter 1 – Basic Concepts
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
Chapter 3 Growth of Functions
Not all algorithms are created equally Insertion of words from a dictionary into a sorted list takes a very long time. Insertion of the same words into.
1 Lecture 4 Analytical Modeling of Parallel Programs Parallel Computing Fall 2008.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming with MPI and OpenMP Michael J. Quinn.
Cmpt-225 Algorithm Efficiency.
CS 584 Lecture 11 l Assignment? l Paper Schedule –10 Students –5 Days –Look at the schedule and me your preference. Quickly.
Asymptotic Analysis Motivation Definitions Common complexity functions
Lecture 5 Today’s Topics and Learning Objectives Quinn Chapter 7 Predict performance of parallel programs Understand barriers to higher performance.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 2 Elements of complexity analysis Performance and efficiency Motivation: analysis.
Algorithm Analysis (Big O)
Analysis of Performance
Instructor Neelima Gupta
Computer Science 320 Measuring Speedup. What Is Running Time? T(N, K) says that the running time T is a function of the problem size N and the number.
Chapter 4 Performance. Times User CPU time – Time that the CPU is executing the program System CPU time – time the CPU is executing OS routines for the.
Program Performance & Asymptotic Notations CSE, POSTECH.
Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.
Week 2 CS 361: Advanced Data Structures and Algorithms
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
Mathematics Review and Asymptotic Notation
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
Analysis of Algorithms CSCI Previous Evaluations of Programs Correctness – does the algorithm do what it is supposed to do? Generality – does it.
Lecture 9 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.
MS 101: Algorithms Instructor Neelima Gupta
Parallel Programming with MPI and OpenMP
Chapter 9 Efficiency of Algorithms. 9.3 Efficiency of Algorithms.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Asymptotic Performance. Review: Asymptotic Performance Asymptotic performance: How does algorithm behave as the problem size gets very large? Running.
Algorithm Analysis (Big O)
Asymptotic Analysis CSE 331. Definition of Efficiency An algorithm is efficient if, when implemented, it runs quickly on real instances Implemented where?
E.G.M. PetrakisAlgorithm Analysis1  Algorithms that are equally correct can vary in their utilization of computational resources  time and memory  a.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Algorithms Lecture #05 Uzair Ishtiaq. Asymptotic Notation.
Computational complexity The same problem can frequently be solved with different algorithms which differ in efficiency. Computational complexity is a.
1 Chapter 2 Algorithm Analysis Reading: Chapter 2.
Complexity of Algorithms Fundamental Data Structures and Algorithms Ananda Guna January 13, 2005.
1 Parallel Processing Fundamental Concepts. 2 Selection of an Application for Parallelization Can use parallel computation for 2 things: –Speed up an.
CSE 3358 NOTE SET 2 Data Structures and Algorithms 1.
Data Structures & Algorithm CS-102 Lecture 12 Asymptotic Analysis Lecturer: Syeda Nazia Ashraf 1.
DCS/1 CENG Distributed Computing Systems Measures of Performance.
Potential for parallel computers/parallel programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to Algorithms
Big-O notation.
Analysis of algorithms
Analysis of Algorithms
Chapter 2 Fundamentals of the Analysis of Algorithm Efficiency
Chapter 3: Principles of Scalable Performance
CSC 413/513: Intro to Algorithms
Advanced Analysis of Algorithms
Chapter 2.
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Potential for parallel computers/parallel programming
Potential for parallel computers/parallel programming
Analysis of algorithms
Potential for parallel computers/parallel programming
CS210- Lecture 2 Jun 2, 2005 Announcements Questions
Advanced Analysis of Algorithms
Potential for parallel computers/parallel programming
Presentation transcript:

1 Potential for Parallel Computation Chapter 2 – Part 2 Jordan & Alaghband

2 Behavior of Algorithms Small Problems  Can calculate actual performance measures  Size (operations); time units  Often can generalize, but need proof For large problems, characterize the asymptotic behavior  i.e. Big Oh!!  An upper bound

3 Big Oh Definition Assume f(n) & g(n). f(n) is of order g(n), that is, f(n) = 0 (g(n)) iff there exist constants C & N such that for n>N, |f(n)| < c |g(n)| That is, f(n) grows no faster than g(n) g(n) is an upper bound

4 Omega Ω -- lower bound As in Big Oh, except f(n) = Ω (g(n)) means |f(n)| > c.|g(n)| g(n) is a lower bound

5 Exact Bound If both f(n) = O(g(n)) and f(n) = Ω(g(n)) then f(=n) = Θ(g(n)) and we say g(n) is an exact bound for f(n) (aka tight-bound)

6 Speedup and Efficiency of Algorithms For any given computation (algorithm): Let Tp be the time to perform a computation with P processors. (arithmetic units, or PEs) We assume that any P independent operations can be done simultaneously. Note: The Depth of an algorithm T , the minimum execution time. The Speedup with P processors is Sp = T1 / Tp and Efficiency is Ep = Sp / P

7 These numbers, S P and E P, refer to an algorithm and not to a machine. Similar numbers can be defined for specific hardware. The time T1 can be chosen in different ways: To evaluate how good an algorithm is, it should be the time for the “BEST” sequential algorithm.

8 The Minimum Number of Processors Giving the Maximum Speedup: Let  be the minimum number of P of processors such that T P = T  i.e.  = min { P | T P = T  } Then, T , S , E  are the best known time, speedup, and efficiency, respectively.

9 E = A + B(CDE + F + G) + H

10 Including the distributive law, we can get an even smaller depth. But the number of operations will increase.

11 Performance Consider small problems  e.g. Evaluation of Arithmetic Expression  Not much improvement possible  Even auto-optimization probably takes longer than sequential processing Gains: Problems that are “compute bound” i.e. processor bound; computation intensive

12 Factors Influencing Performance LevelNotes Hardware Establishes fundamental speed scale Architecture Both individual unit & system level Operating Sys. As an extension to hardware Language Compiler & run-time support Program Control structure & synchronization Algorithm Data dependence structure

13 Impacting Performance Hardware (user has no influence) Digital logic Clock speed Circuit interconnection Architecture Sequential Ns degree of parallelism ALU, CU, Memory, Cache Synchronization among processors

14 Impacting Performance Operating System Shared resources Process control, synchronization, data movement I/O Programming Language Operations available & the implementation Compiler / optimizations

15 Impacting Performance Program Organization & style; structure Data structures Study of compiler design is helpful Algorithm Often tradeoff memory vs speed Use “reasonable” algorithms, often have only minimal impact

16 Let T(P) be the execution time with hardware parallelism P. Let S be the time doing the sequential part of the work and Time to do the parallel part of the work sequentially is Q, i.e., S and Q are the sequential and parallel amounts of work measured by time on one processor, The total time is Amdahl’s Law

17 Amdahl’s Law Expressing this in terms of the fraction of serial work Amdahl’s law states that Speedup Efficiency

18 There are several consequences of this simple performance model. In order to achieve at least 50% efficiency on the program with hardware parallelism P, f can be no larger than. This becomes harder and harder to achieve as P becomes large. Amdahl used this result to argue that sequential processing was best, But it has several useful interpretations in different parallel environments:

19 A very small amount of unparallelized code can have a very large effect on efficiency if the parallelism is large A fast vector processor must also have a fast scalar processor in order to achieve a sizeable fraction of its peak performance Effort in parallelizing a small fraction of code that is currently executed sequentially may pay off in large performance gains Hardware that allows even a small fraction of new things to be done in parallel may be considerably more efficient.

20 Although Amdahl’s law is a simple performance model, it need not be taken simplistically. The behavior of the sequential fraction, f, for example, can be quite important. System sizes, especially the number, P, of processors are often increased for the purpose of running larger problems. Increasing the problem size often does not increase the absolute amount of sequential work significantly. In this case, f is a decreasing function of problem size, and if problem size is increased with P, the somewhat pessimistic implications of equations look much more favorable. see Problem 2.16 for a specific example. The behavior of performance as both problem and system size increase is called scalability.

21 Homework – Chapter 2 Page 46 +  2.1  2.2  2.13  2.14  2.16