1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.

Slides:



Advertisements
Similar presentations
1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.
Advertisements

Computer Abstractions and Technology
Power Reduction Techniques For Microprocessor Systems
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
Chapter1 Fundamental of Computer Design Dr. Bernard Chen Ph.D. University of Central Arkansas.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.5 – 1.10)  Power/Energy examples  Performance summaries  Measuring cost and dependability.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Quantitative principles of computer design  Measuring cost.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Performance summaries  Quantitative principles of computer.
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  logistics  why computer organization is important  modern trends.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
1 Lecture 2: Metrics to Evaluate Systems Topics: Power and technology trends wrap-up, benchmark suites, performance equation, summarizing performance with.
CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Computer Performance Computer Engineering Department.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
Lecture 1: Performance EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2013, Dr. Rozier.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 5810/6810: Hennessy.
1 CS/EE 6810: Computer Architecture Class format:  Most lectures on YouTube *BEFORE* class  Use class time for discussions, clarifications, problem-solving,
MS108 Computer System I Lecture 2 Metrics Prof. Xiaoyao Liang 2014/2/28 1.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Morgan Kaufmann Publishers
Microprocessor Microarchitecture Introduction Lynn Choi School of Electrical Engineering.
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
Computer Science and Engineering Power-Performance Considerations of Parallel Computing on Chip Multiprocessors Jian Li and Jose F. Martinez ACM Transactions.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
1 Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM  Video 1: Using AM.
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
1 Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2)
CS203 – Advanced Computer Architecture
CS203 – Advanced Computer Architecture Performance Evaluation.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
1 Lecture: Benchmarks, Pipelining Intro Topics: Performance equations wrap-up, Intro to pipelining.
CS203 – Advanced Computer Architecture
Microprocessor Microarchitecture Introduction
CS203 – Advanced Computer Architecture
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
Lecture 3: MIPS Instruction Set
Green cloud computing 2 Cs 595 Lecture 15.
Lecture 1: CS/ECE 3810 Introduction
Lecture: Branch Prediction
Lecture: Branch Prediction
Morgan Kaufmann Publishers
Architecture & Organization 1
Lecture 1: CS/ECE 3810 Introduction
Lecture 2: Performance Today’s topics: Technology wrap-up
Architecture & Organization 1
CS/EE 6810: Computer Architecture
Performance of computer systems
The University of Adelaide, School of Computer Science
Lecture 3: MIPS Instruction Set
The University of Adelaide, School of Computer Science
Performance of computer systems
The University of Adelaide, School of Computer Science
Utsunomiya University
Presentation transcript:

1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and Patterson’s Computer Architecture, A Quantitative Approach, 5 th Edition Topics  Measuring performance/cost/power  Instruction level parallelism, dynamic and static  Memory hierarchy  Multiprocessors  Storage systems and networks

2 Organizational Issues Office hours, MEB 3414, by appointment TA: Ali Shafiei, office hours and contact info: TBA Special accommodations, add/drop policies (see class webpage) Class web-page, slides, notes, and class mailing list at Grades:  Two midterms, 25% each  Homework assignments, 50%, you may skip one  No tolerance for cheating

3 Lecture 1: Computing Trends, Metrics Topics: (Sections , )  Technology trends  Performance summaries  Performance equations

4 Historical Microprocessor Performance Source: H&P textbook

5 Points to Note The 52% growth per year is because of faster clock speeds and architectural innovations (led to 25x higher speed) Clock speed increases have dropped to 1% per year in recent years The 22% growth includes the parallelization from multiple cores Moore’s Law: transistors on a chip double every months

6 Clock Speed Increases Source: H&P textbook

7 Processor Technology Trends Transistor density increases by 35% per year and die size increases by 10-20% per year… more cores! Transistor speed improves linearly with size (complex equation involving voltages, resistances, capacitances)… can lead to clock speed improvements! The power wall: it is not possible to consistently run at higher frequencies without hitting power/thermal limits (Turbo Mode can cause occasional frequency boosts) Wire delays do not scale down at the same rate as logic delays

8 Recent Microprocessor Trends Source: Micron University Symp. Transistors: 1.43x / year Cores: x Performance: 1.15x Frequency: 1.05x Power: 1.04x

9 What Helps Performance? Note: no increase in clock speed In a clock cycle, can do more work -- since transistors are faster, transistors are more energy-efficient, and there’s more of them Better architectures: finding more parallelism in one thread, better branch prediction, better cache policies, better memory organizations, more thread-level parallelism, etc. Core design is undergoing little change, but more cores available per chip; most future innovations will likely be in multi-threaded prog models and memory hierarchies

10 Where Are We Headed? Modern trends:  Clock speed improvements are slowing  power constraints  Difficult to further optimize a single core for performance  Multi-cores: each new processor generation will accommodate more cores  Need better programming models and efficient execution for multi-threaded applications  Need better memory hierarchies  Need greater energy efficiency

11 Modern Processor Today Intel Core i7  Clock frequency: 3.2 – 3.33 GHz  45nm and 32nm products  Cores: 4 – 6  Power: 95 – 130 W  Two threads per core  3-level cache, 12 MB L3 cache  Price: $300 - $1000

12 Power Consumption Trends Dyn power  activity x capacitance x voltage 2 x frequency Capacitance per transistor and voltage are decreasing, but number of transistors is increasing at a faster rate; hence clock frequency must be kept steady Leakage power is also rising; is a function of transistor count, leakage current, and supply voltage Power consumption is already between W in high-performance processors today Energy = power x time = (dynpower + lkgpower) x time

13 Power Vs. Energy Energy is the ultimate metric: it tells us the true “cost” of performing a fixed task Power (energy/time) poses constraints; can only work fast enough to max out the power delivery or cooling solution If processor A consumes 1.2x the power of processor B, but finishes the task in 30% less time, its relative energy is 1.2 X 0.7 = 0.84; Proc-A is better, assuming that 1.2x power can be supported by the system

14 Reducing Power and Energy Can gate off transistors that are inactive (reduces leakage) Design for typical case and throttle down when activity exceeds a threshold DFS: Dynamic frequency scaling -- only reduces frequency and dynamic power, but hurts energy DVFS: Dynamic voltage and frequency scaling – can reduce voltage and frequency by (say) 10%; can slow a program by (say) 8%, but reduce dynamic power by 27%, reduce total power by (say) 23%, reduce total energy by 17% (Note: voltage drop  slow transistor  freq drop)

15 Other Technology Trends DRAM density increases by 40-60% per year, latency has reduced by 33% in 10 years (the memory wall!), bandwidth improves twice as fast as latency decreases Disk density improves by 100% every year, latency improvement similar to DRAM Emergence of NVRAM technologies that can provide a bridge between DRAM and hard disk drives

16 Measuring Performance Two primary metrics: wall clock time (response time for a program) and throughput (jobs performed in unit time) To optimize throughput, must ensure that there is minimal waste of resources Performance is measured with benchmark suites: a collection of programs that are likely relevant to the user  SPEC CPU 2006: cpu-oriented programs (for desktops)  SPECweb, TPC: throughput-oriented (for servers)  EEMBC: for embedded processors/workloads

17 Summarizing Performance Consider 25 programs from a benchmark set – how do we capture the behavior of all 25 programs with a single number? P1 P2 P3 Sys-A Sys-B Sys-C  Total (average) execution time  Total (average) weighted execution time or Average of normalized execution times  Geometric mean of normalized execution times

18 AM Example We fixed a reference machine X and ran 4 programs A, B, C, D on it such that each program ran for 1 second The exact same workload (the four programs execute the same number of instructions that they did on machine X) is run on a new machine Y and the execution times for each program are 0.8, 1.1, 0.5, 2 With AM of normalized execution times, we can conclude that Y is 1.1 times slower than X – perhaps, not for all workloads, but definitely for one specific workload (where all programs run on the ref-machine for an equal #cycles) With GM, you may find inconsistencies

19 GM Example Computer-A Computer-B Computer-C P1 1 sec 10 secs 20 secs P secs 100 secs 20 secs Conclusion with GMs: (i) A=B (ii) C is ~1.6 times faster For (i) to be true, P1 must occur 100 times for every occurrence of P2 With the above assumption, (ii) is no longer true Hence, GM can lead to inconsistencies

20 Summarizing Performance GM: does not require a reference machine, but does not predict performance very well  So we multiplied execution times and determined that sys-A is 1.2x faster…but on what workload? AM: does predict performance for a specific workload, but that workload was determined by executing programs on a reference machine  Every year or so, the reference machine will have to be updated

21 Normalized Execution Times Advantage of GM: no reference machine required Disadvantage of GM: does not represent any “real entity” and may not accurately predict performance Disadvantage of AM of normalized: need weights (which may change over time) Advantage: can represent a real workload

22 CPU Performance Equation Clock cycle time = 1 / clock speed CPU time = clock cycle time x cycles per instruction x number of instructions Influencing factors for each:  clock cycle time: technology and pipeline  CPI: architecture and instruction set design  instruction count: instruction set design and compiler CPI (cycles per instruction) or IPC (instructions per cycle) can not be accurately estimated analytically

23 Title Bullet