CSE2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 2: Measuring Performance Topics: 1. Performance:

Slides:



Advertisements
Similar presentations
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Advertisements

Computer Abstractions and Technology
Power calculation for transistor operation What will cause power consumption to increase? CS2710 Computer Organization1.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
100 Performance ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
Lec 2 Aug 31 review of lec 1 continue Ch 1 course overview performance measures Ch 1 exercises quiz 1.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
Assessing and Understanding Performance B. Ramamurthy Chapter 4.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
CMSC 611: Advanced Computer Architecture Benchmarking Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Chapter 1 Computer Abstractions and Technology Part II.
Chapter 1 - The Computer Revolution Chapter 1 — Computer Abstractions and Technology — 1  Progress in computer technology  Underpinned by Moore’s Law.
1 CHAPTER 2 THE ROLE OF PERFORMANCE. 2 Performance Measure, Report, and Summarize Make intelligent choices Why is some hardware better than others for.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
B0111 Performance Anxiety ENGR xD52 Eric VanWyk Fall 2012.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Chapter 1 — Computer Abstractions and Technology — 1 Understanding Performance Algorithm Determines number of operations executed Programming language,
1 Acknowledgements Class notes based upon Patterson & Hennessy: Book & Lecture Notes Patterson’s 1997 course notes (U.C. Berkeley CS 152, 1997) Tom Fountain.
Performance.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
Chapter 1 Technology Trends and Performance. Chapter 1 — Computer Abstractions and Technology — 2 Technology Trends Electronics technology continues to.
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /03/2013 Lecture 3: Computer Performance Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Modified by S. J. Fritz Spring 2009 (1) Based on slides from D. Patterson and www-inst.eecs.berkeley.edu/~cs152/ COM 249 – Computer Organization and Assembly.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
June 20, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 1: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed.
Measuring Performance and Benchmarks Instructor: Dr. Mike Turi Department of Computer Science and Computer Engineering Pacific Lutheran University Lecture.
Computer Architecture & Operations I
Morgan Kaufmann Publishers Technology Trends and Performance
Measuring Performance II and Logic Design
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
September 2 Performance Read 3.1 through 3.4 for Tuesday
Defining Performance Which airplane has the best performance?
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
Chapter 1 Computer Abstractions & Technology Performance Evaluation
Computer Performance He said, to speed things up we need to squeeze the clock.
Performance.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
CS161 – Design and Architecture of Computer Systems
Computer Organization and Design Chapter 4
Presentation transcript:

CSE2021: Computer Organization Instructor: Dr. Amir Asif Department of Computer Science York University Handout # 2: Measuring Performance Topics: 1. Performance: Definition 2. Performance Metrics: CPU Execution Time and Throughput 3. Benchmarks: SPEC’95 4. Alternative Performance Metrics: MIPS and FLOPS Patterson: Sections 1.4 – 1.9.

2 Analogy with Commercial Airplanes — To know which of the four planes exhibits the best performance, we need to define a criteria for measuring “performance”. Performance Criteria:Winner: SpeedBAC/Scud Concorde CapacityBoeing 747 RangeDouglas DC-8-50 ThroughputAirbus A3xx Airplane Passenger Capacity Cruising range (miles) Cruising speed (mph) Passenger throughput (passengers × mph) Boeing × 610 = 228,750 Boeing × 610 = 286,700 BAC/Scud Concorde × 1350 = 178,200 Douglas DC × 544 = 79,424 What does it mean do say that one computer is better than another?

3 Computer Performance (1) — Performance of a computer is based on the following criteria: 1.Response/Execution Time: Elapsed time between the start and the end of one task. 2.Throughput: Total number of tasks finished in a given interval of time. — An IT manager will be interested in having a higher overall throughput while a computer user will like to have a lower execution time for his task. — Using execution time as the criteria, the performance of a machine X is defined as — Performance ratio (n) between two machines X and Y is defined as

4 Computer Performance (2) Activity 1: If machine X runs a program in 30 seconds and machine Y runs the same program in 45 seconds, how much faster is X than Y? Activity 2: Discuss which of the following two options is suited to enhance performance from a user’s perspective: (a) Upgrading a machine to a faster CPU (b) Adding additional processors to the machine s.t. multiple processors are used for different tasks. Repeat for an IT manager?

5 What is Execution Time (1)? 1.Elapsed Time/Response Time/Wall Clock Time: is defined as the clock time it takes from the start to the end of a program or a task — Since computer is a timeshared machine (running several programs simultaneously), the elapsed time will be dependent on the number and complexity of other programs running on the machine. 2.CPU Time: is the execution time that the CPU spends on completing a task — CPU time does not include time spent on running other programs or waiting for the I/O to become free. — CPU time can be further broken down into: (a) User CPU time: CPU time spent to execute the measured program (b) System CPU time: CPU time spent in the operating system performing tasks on behalf of the measured program 3.Command time in Unix can be used to determine the elapsed time and CPU time for a particular program Syntax: time Result:

6 What is Execution Time (2)? 4.Performance based on User CPU time is called the CPU performance 5.Performance based on System time is called the system performance 6.Vendors specify the speed of a computer in terms of clock cycles. For example, a 1GHz Pentium is generally believed to be faster than a 500MHz Pentium. We define the clock cycles formally next. Multiples defined:

7 Clock (1) — All events in a computer are synchronized to the clock signal — Clock signal is therefore received by every HW component in a computer — Clock cycle time: is defined as the duration of 1 cycle of the clock signal — Clock cycle rate: is the inverse of the clock cycle time. — CPU execution time is therefore defined as Binary 0 Binary 1 Leading edge 1 cycle Trailing edge Clock Signal

8 Clock (2) — Timing Programs generally return the average number of clock cycle needed per instruction (denoted by CPI) Activity 3: For a CPU, instructions from a high-level language are classified in 3 classes Two SW implementations with the following instruction counts are being considered Which implementation executes the higher number of instructions? Which runs faster? What is the CPI count for each implementation? Instruction ClassABC CPI for the instruction class123 Instruction counts per instruction class ABC Implementation 1212 Implementation 2411

9 Performance Comparison (1) To compare performance between two computers, 1.Select a set of programs that represent the workload 2.Run these programs on each computer 3.Compare the average execution time of each computer 4.At time the geometric mean is used. Activity 4: Based on the arithmetic and geometric means execution time, which of the two computer is faster? Execution TimeComputer AComputer B Program 1210 Program Program Program 42575

10 Performance Comparison: Benchmarks (2) — Benchmarks: are standard programs chosen to compare performance between different computers. — Benchmarks are generally chosen from the applications that a user would typically use the computer to execute. — Benchmarks can be classified in three categories: 1.Real applications reflecting the expected workload, e.g., multimedia, computer visualization, database, or macromedia director applications 2.Small benchmarks are specialized code segments with a mixture of different types of instructions 3.Benchmark suites containing a standard set of real programs and applications. A commonly used suite is SPEC (System Performance Evaluation Cooperative) with different versions available, e.g., SPEC89, SPEC’92, SPEC’95, SPEChpc96, SPEC CPU2006 and SPECINTC2006 suites.

11 Performance Comparison: SPEC Suite (2) — SPEC’95 suite has a total of 18 programs (integer and floating point). However, SPEC CPU2000 has a total of 32 programs — SPEC ratio for a program is defined as the ratio of the execution time of the program on a Sun Ultra 5/10(300 MHz processor) to the execution time on the measured machine. — CINT2000 is the geometric mean of the SPEC ratios obtained from the integer programs. The geometric mean is defined as — CFP2000 is the geometric mean of the SPEC ratios from the floating-point programs Activity 5: Complete the following table to predict the performance of machines A and B Time on A (seconds) Time on B (seconds) Normalized to ANormalized to B ABAB Program 1525 Program Arithmetic Mean Geometric Mean

12 Performance Comparison: CINT2006 for Opteron X (3) NameDescriptionInstruct. Count ×10 9 CPIClock Cycle time (ns) Exec time (s) Reference time (s) SPECratio = Ref. time / Ex. time perlInterpreted string processing2, , bzip2Block-sorting compression2, , gccGNU C Compiler1, , mcfCombinatorial optimization ,3459, goGo game (AI)1, , hmmerSearch gene sequence2, , sjengChess game (AI)2, , libquantumQuantum computer simulation1, ,04720, h264avcVideo compression3, , omnetppDiscrete event simulation , astarGames/path finding1, , xalancbmkXML parsing1, ,1436, Geometric mean11.7 High cache miss rates

13 Improving Performance Performance of a CPU can be improved by: 1.Increasing the clock rate (decreasing the clock cycle time) 2.Enhancements in the Compiler to decrease the instruction count in a program 3.Improvement in the CPU to decrease the clock cycle per instruction (CPI) Unfortunately factors (1 – 3) are not independent. For example, if you increase the clock frequency then the CPI may also increase.

14 Power Trends (1) — One way of enhancing performance is to increase clock rate (implying a reduction in clock cycle time). — A consequence of increasing clock rate is an increase in power dissipation.

15 Power Trends (2) — Dynamic power dissipation is computed from the expression — In the previous figure, note that when clock rates have increased by a factor of 1000 (25 to 2667), the power dissipation only increased by a factor of 20 (4.9 to 95). — The above is largely a consequence of the operating voltage in CMOS technology going down from 5V to 1V. Activity: What is the impact on dynamic power dissipation if a new processor reduces the capacitive load, voltage, and clock frequency all by a factor of 15%?

16 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency

17 SPEC Power Benchmark — SPECpower reports power consumption of servers at different workloads, divided by 10% increments, over a period of time. Target Load %Perform. (ssj_ops/sec)Average Power (Watts) 100%231, %211, %185, %163, %140, %118, %920, %70, %47, %23, %0141 Overall sum1,283,5902,605 ∑ssj_ops/ ∑power493

18 PITFALL: Amdahl’s Law (1)  Improving an aspect of a computer and expecting a proportional improvement in overall performance  Suppose a program runs in 100s on a computer with multiply operations responsible for 80% of the time. How much should the speed of multiplication be improved to make the program run twice faster? Five times faster? Solution: For 2 times faster, the speed of multiplication should be improved by a factor of 8/3. Five times faster is not possible.

19 PITFALL: MIPS as a Performance Metric (2) Do not use MIPS (million instructions per second) as a performance metric Activity 6: For a CPU, instructions from a high-level language are classified in 3 classes Two SW implementations with the following instruction counts are being considered Assuming that the clock rate is 500 MHz, which code sequence will be run faster based on (a) MIPS and (b) execution time. Instruction ClassABC CPI for the instruction class123 Instruction counts (in billions) for each instruction class ABC Implementation 1511 Implementation 21011

20 PITFALLS: Do Not’s (3) 4.Do not use MFLOPS (Million Floating point operations per second) as a performance metric 5.Do not use PEAK Performance as a performance metric Example: 100MFLOPS and 150 MOPS 16MFLOPS and 33 MOPS Running SPEC benchmarks, R3000 was 15% faster in execution time based on the geometric mean

21 PITFALLS: Do Not’s (4) 4.Do not use synthetic benchmarks (vendor created) to predict performance — Commonly used synthetic benchmarks are Whetstone and Dhrystone — Whetstone was based on Algol in an engineering environment and later converted to Fortran — Dhrystone was written in Ada for systems programming environments and later converted to C — Such synthetic benchmarks do not reflect the applications typically run by a user 5.Do not use the arithmetic mean of execution time to predict performance. Geometric means provide better estimates.