Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Performance Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Computer Abstractions and Technology
Chapter 8. Pipelining.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
EECS 318 CAD Computer Aided Design LECTURE 2: DSP Architectures Instructor: Francis G. Wolff Case Western Reserve University This presentation.
Read Section 1.4, Section 1.7 (pp )
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
CSCE 212 Chapter 4: Assessing and Understanding Performance Instructor: Jason D. Bakos.
Chapter 4 Assessing and Understanding Performance Bo Cheng.
Performance D. A. Patterson and J. L. Hennessey, Computer Organization & Design: The Hardware Software Interface, Morgan Kauffman, second edition 1998.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
CS/ECE 3330 Computer Architecture Chapter 1 Performance / Power.
Assessing and Understanding Performance B. Ramamurthy Chapter 4.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture 3: Computer Performance
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)
Chapter 1 Section 1.4 Dr. Iyad F. Jafar Evaluating Performance.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Chapter 1 - The Computer Revolution Chapter 1 — Computer Abstractions and Technology — 1  Progress in computer technology  Underpinned by Moore’s Law.
9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
10/19/2015Erkay Savas1 Performance Computer Architecture – CS401 Erkay Savas Sabanci University.
Computer Science Education
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Chapter 1 — Computer Abstractions and Technology — 1 Understanding Performance Algorithm Determines number of operations executed Programming language,
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
Performance.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Morgan Kaufmann Publishers
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Chapter 4. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Understanding underlying organizational aspects Why.
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
EGRE 426 Computer Organization and Design Chapter 4.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Performance Computer Organization II 1 Computer Science Dept Va Tech January 2009 © McQuain & Ribbens Defining Performance Which airplane has.
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Yaohang Li.
Lecture 4. Sequential Logic #3 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE221, COMP211 Logic Design.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.
Measuring Performance II and Logic Design
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Defining Performance Which airplane has the best performance?
Computer Architecture & Operations I
Morgan Kaufmann Publishers
CSCE 212 Chapter 4: Assessing and Understanding Performance
CS2100 Computer Organisation
Lecturer: Alan Christopher
Morgan Kaufmann Publishers Computer Performance
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
CS161 – Design and Architecture of Computer Systems
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

Lecture 3. Performance Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212, CYDF210 Computer Architecture

Korea Univ Response Time and Throughput How to measure performance of a computer?  Response time (Execution time, Latency) Time between the start and the completion of a task Important to individual users Embedded computers and PCs are more focused on response time  Throughput Total number of tasks done in a given time Important to datacenter and/or supercomputer managers Servers are more focused on throughput Need different performance metrics depending on machine types and/or usages 2

Korea Univ Response Time and Throughput 3 Laundry Example  Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold  Washer takes 30 minutes  Dryer takes 40 minutes  Folder takes 20 minutes ABCD

Korea Univ Sequential Laundry 4 Response time: Throughput: ABCD PM Midnight TaskOrderTaskOrder Time 90 mins 0.67 tasks / hr (= 90mins/task, 6 hours for 4 loads)

Korea Univ Pipelined Laundry 5 ABCD 6 PM Midnight TaskOrderTaskOrder Time mins 1.14 tasks / hr (= 52.5 mins/task, 3.5 hours for 4 loads) Response time: Throughput:

Korea Univ Pipelining Lessons 6 Pipelining doesn’t help latency (response time) of a single task Pipelining helps throughput of entire workload Multiple tasks operating simultaneously Unbalanced lengths of pipeline stages reduce speedup Potential speedup = # of pipeline stages We are going to talk in detail about pipelining in chapter 4 The term project is to implement CPU with pipelining ABCD 6 PM 789 TaskOrderTaskOrder Time

Korea Univ 7 Let’s focus on response time for now…

Korea Univ Relative Performance To maximize performance of your computer, you want to minimize execution time (response time) for a task Thus, we can relate performance and execution time for a computer X 8 If a computer X is n times faster than a computer Y, performance X execution_time Y = n performance Y execution_time X = performance X = execution_time X 1

Korea Univ Example A computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds. How much is A faster than B? 9 = 1.5 The performance ratio is So, A is 1.5 times faster than B performance X execution_time Y = n performance Y execution_time X =

Korea Univ Measuring Execution Time Execution time (elapsed time or wall-clock time) is measured in seconds per program  Total execution time includes all aspects: disk access, memory access, I/O activities, OS overhead  It determines the system performance CPU time  The time CPU spent processing a given job  It does not include time spent waiting for I/O, or running other programs 10

Korea Univ CPU Clock Let’s use the CPU time for simplicity to measure performance Virtually all computers are constructed in sync with a clock  Discrete time intervals are called clock cycles 11 clock cycle 0 clock cycle 1 clock cycle 2 clock cycle 3 clock cycle 4 clock cycle 5 clock cycle 6 Clock period (T): duration of a clock cycle e.g. 500ps = Clock frequency (f) : clock cycles per second (1/T) e.g. 1/T = 1/0.5ns = 0.5ns = 500×10 –12 s 2.0GHz = 2.0×10 9 Hz

Korea Univ Reminder: Clock Oscillators 12

Korea Univ Reminder: Clock Oscillators in Digital Systems 13 Virtually all digital systems are essentially synchronous to the clock

Korea Univ Where are clock oscillators? 14

Korea Univ CPU Time Express CPU time in terms of clock 15 CPU Time = CPU clock cycles X clock cycle time (T) = Clock frequency (f) CPU clock cycles So, the performance is improved by  Reducing the number of clock cycles  Increasing clock frequency

Korea Univ Example Computer A running at 2GHz requires 10 second CPU time to run your program Let’s design a new Computer B  Aim for 6 second CPU time to run the same program  but causes 1.2 × clock cycles, compared to Computer A  How fast should the computer B’s clock (frequency) be? 16  Computer B requires 6 seconds to run the program 6 seconds = (1.2 x CPU clock cycle A ) / f  How many clock cycles computer A needs? 10 sec = CPU clock cycle A / 2GHz CPU clock cycle A = 10 sec X 2GHz = 20G cycles  By plugging it into the first equation, 6 seconds = (1.2 x 20G cycles) / f f B = 4GHz

Korea Univ #Instructions and CPI The performance equation does not include any reference to the number of instructions needed to run a program Since computer executes instructions to run programs, the execution time must depend on the number of instructions executed Execution time is the number of instructions executed multiplied by the average time per instruction 17 CPU Time = CPU clock cycles X T CPU clock cycles = # instructions X Avg. clock cycles per inst (CPI) CPU Time = # insts X CPI X T = # insts X CPI / f

Korea Univ #Instructions and CPI #insts is determined by  How efficient your program is  How good the ISA is  How efficient machine code the compiler generates CPI is determined by your CPU design (microarchitecture)  For example: sequential vs pipeline implementations f is determined by your CPU design (microarchitecture) and semiconductor technology  Critical path between flip-flops determines the clock frequency  Advanced semiconductor technology (45nm, 32nm, 22nm etc) would increase the clock frequency 18 CPU Time = # insts X CPI X T = # insts X CPI / f

Korea Univ CPI Example There are 2 computers (Computer A and Computer B). Their CPUs implement the same ISA, and use the same compiler to compile application programs. But microarchitectures are different.  Computer A has a clock cycle time of 250ps and CPI of 2.0 when running a program  Computer B has a cycle time of 500ps and CPI of 1.2 when running the same program Which is faster, and by how much? 19 What is the execution time to run the program in Computer A? # insts X CPI (2.0) X 250 ps = # insts X 500 ps What is the execution time to run the program in Computer B? # insts X CPI (1.2) X 500ps = # insts X 600 ps So, A is faster! How much? = performance A /performance B = exetime B /exetime A = 600ps / 500ps = 1.2 Computer A is 20% faster than computer B CPU Time = # insts X CPI X T = # insts X CPI / f

Korea Univ CPI in More Detail If different instructions take different numbers of cycles (assume that we have n different instructions), 20 CPU Time = Clock cycles X clock cycle time (T) Average CPI

Korea Univ CPI Example Suppose that there is one computer (Hardware designer supplied CPIs in orange), and there are 2 compilers to compile an application program.  The compiler A generated the machine code of sequence 1  The compiler B generated the machine code of sequence 2 Which compiler is better for the application program? 21 InstructionsABC CPI123 Instruction count in sequence Instruction count in sequence Sequence 1: Clock cycles = 2×1 + 1×2 + 2×3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: Clock cycles = 4×1 + 1×2 + 1×3 = 9 Avg. CPI = 9/6 = 1.5

Korea Univ Performance Summary Performance depends on  Algorithm affects the instruction count  Programming language affects the instruction count and CPI  Compiler affects the instruction count and CPI  Instruction set architecture affects the instruction count, CPI, and T (f)  Microarchitecture (Hardware implementation) affect CPI and T (f)  Semiconductor technology affects T (f) 22 CPU Time = # insts X CPI X clock cycle time (T) = # insts X CPI / f

Korea Univ SPEC CPU Benchmark Benchmarks are programs used to measure performance  Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC) is an effort funded and supported by a number of computer vendors to create standard sets of benchmarks for modern computer systemsSPEC  SPEC89: In 1989, SPEC originally created a benchmark set focusing on processor performance  SPEC CPU2006 is the latest: CINT2006 (integer) is for measuring and comparing compute-intensive integer performance CFP2006 (floating-point) is for measuring and comparing compute-intensive floating-point performance 23

Korea Univ Backup Slides 24

Korea Univ Some Basics Kilobyte (KB) – 2 10 or 1,024 bytes Megabyte (MB)– 2 20 or 1,048,576 bytes Gigabyte (GB) – 2 30 or 1,073,741,824 bytes Terabyte (TB) – 2 40 or 1,099,511,627,776 bytes Petabyte (PB) – 2 50 or 1024 terabytes Exabyte (EB) – 2 60 or 1024 petabytes 25