1 Lecture 2: Measuring Performance/Cost/Power Today’s topics: (Sections 1.6, 1.4, 1.7, 1.8)  Quantitative principles of computer design  Measuring cost.

Slides:



Advertisements
Similar presentations
1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.
Advertisements

CS1104: Computer Organisation School of Computing National University of Singapore.
Computer Abstractions and Technology
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Computer Organization and Architecture 18 th March, 2008.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.5 – 1.10)  Power/Energy examples  Performance summaries  Measuring cost and dependability.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Quantitative principles of computer design  Measuring cost.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Performance summaries  Quantitative principles of computer.
1 Lecture 11: Digital Design Today’s topics:  Evaluating a system  Intro to boolean functions.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Chapter 4 Assessing and Understanding Performance
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Lecture: Pipelining Basics
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
1 Lecture 2: Metrics to Evaluate Systems Topics: Power and technology trends wrap-up, benchmark suites, performance equation, summarizing performance with.
Lecture 2: Technology Trends and Performance Evaluation Performance definition, benchmark, summarizing performance, Amdahl’s law, and CPI.
Computer Organization and Design Performance Montek Singh Mon, April 4, 2011 Lecture 13.
September 9, Digital System Architecture Cost, Price, and Price for Performance Pradondet Nilagupta Spring 2001 (original notes from Randy Katz,
Current Computer Architecture Trends CE 140 A1/A2 29 August 2003.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
Economics and Sustainability Financial Factors Influencing Success.
1 Embedded Systems Computer Architecture. Embedded Systems2 Memory Hierarchy Registers Cache RAM Disk L2 Cache Speed (faster) Cost (cheaper per-byte)
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 5810/6810: Hennessy.
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
CMSC 611 Evaluating Cost Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from David Culler, UC Berkeley.
CDA 3101 Fall 2013 Introduction to Computer Organization Computer Performance 28 August 2013.
CS510 Computer Architectures
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
Advanced Computer Architecture Fundamental of Computer Design Instruction Set Principles and Examples Pipelining:Basic and Intermediate Concepts Memory.
CMSC 611 Evaluating Cost Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from David Culler, UC Berkeley.
Cost and Performance.
December 4, Digital System Architecture Cost, Price, and Price for Performance Pradondet Nilagupta Spring 2001 (original notes from Randy Katz,
Performance Performance
1 Lecture 2: Performance, MIPS ISA Today’s topics:  Performance equations  MIPS instructions Reminder: canvas and class webpage:
Performance – Last Lecture Bottom line performance measure is time Performance A = 1/Execution Time A Comparing Performance N = Performance A / Performance.
4. Performance 4.1 Introduction 4.2 CPU Performance and Its Factors
1 Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM  Video 1: Using AM.
1 Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
EGRE 426 Computer Organization and Design Chapter 4.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.
Chapter 1 Performance & Technology Trends. Outline What is computer architecture? Performance What is performance: latency (response time), throughput.
1 Lecture: Benchmarks, Pipelining Intro Topics: Performance equations wrap-up, Intro to pipelining.
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
Performance. Moore's Law Moore's Law Related Curves.
Measuring Performance II and Logic Design
Lecture 2: Performance Today’s topics:
Lecture 2: Performance Evaluation
Lecture 3: MIPS Instruction Set
Morgan Kaufmann Publishers
CS2100 Computer Organisation
Lecture 2: Performance Today’s topics: Technology wrap-up
Computer Architecture
Performance of computer systems
Lecture 3: MIPS Instruction Set
Performance of computer systems
CS 704 Advanced Computer Architecture
January 25 Did you get mail from Chun-Fa about assignment grades?
Parameters that affect it How to improve it and by how much
Chapter 2: Performance CS 447 Jason Bakos Fall 2001 CS 447.
Computer Organization and Design Chapter 4
CS2100 Computer Organisation
Presentation transcript:

1 Lecture 2: Measuring Performance/Cost/Power Today’s topics: (Sections 1.6, 1.4, 1.7, 1.8)  Quantitative principles of computer design  Measuring cost  Real industrial examples

2 Summarizing Performance Recall discussion on AM versus GM GM: does not require a reference machine, but does not predict performance very well  So we multiplied execution times and determined that sys-A is 1.2x faster…but on what workload? AM: does predict performance for a specific workload, but that workload was determined by executing programs on a reference machine  Every year or so, the reference machine will have to be updated

3 Example We fixed a reference machine X and ran 4 programs A, B, C, D on it such that each program ran for 1 second The exact same workload (the four programs execute the same number of instructions that they did on machine X) is run on a new machine Y and the execution times for each program are 0.8, 1.1, 0.5, 2 With AM of normalized execution times, we can conclude that Y is 1.1 times slower than X – perhaps, not for all workloads, but definitely for one specific workload (where all programs run on the ref-machine for an equal #cycles) With GM, you may find inconsistencies

4 GM Example Computer-A Computer-B Computer-C P1 1 sec 10 secs 20 secs P secs 100 secs 20 secs Conclusion with GMs: (i) A=B (ii) C is ~1.6 times faster For (i) to be true, P1 must occur 100 times for every occurrence of P2 With the above assumption, (ii) is no longer true Hence, GM can lead to inconsistencies

5 CPU Performance Equation CPU time = clock cycle time x cycles per instruction x number of instructions Influencing factors for each:  clock cycle time: technology and organization  CPI: organization and instruction set design  instruction count: instruction set design and compiler CPI (cycles per instruction) or IPC (instructions per cycle) can not be accurately estimated analytically

6 Measuring System CPI Assume that an architectural innovation only affects CPI For 3 programs, base CPIs: 1.2, 1.8, 2.5 CPIs for proposed model: 1.4, 1.9, 2.3 What is the best way to summarize performance with a single number? AM, HM, or GM of CPIs?

7 Example AM of CPI for base case = 1.2 cyc cyc cyc instr instr instr 5.5 cycles is execution time if each program ran for one instruction – therefore, AM of CPI defines a workload where every program runs for an equal #instrs HM of CPI = 1 / AM of IPC ; defines a workload where every program runs for an equal number of cycles GM of CPI: warm fuzzy number, not necessarily representing any workload

8 Amdahl’s Law Architecture design is very bottleneck-driven – make the common case fast, do not waste resources on a component that has little impact on overall performance/power Amdahl’s Law: performance improvements through an enhancement is limited by the fraction of time the enhancement comes into play Example: a web server spends 40% of time in the CPU and 60% of time doing I/O – a new processor that is ten times faster results in a 36% reduction in execution time (speedup of 1.56) – Amdahl’s Law states that maximum execution time reduction is 40% (max speedup of 1.66)

9 Principle of Locality Most programs are predictable in terms of instructions executed and data accessed The Rule: a program spends 90% of its execution time in only 10% of the code Temporal locality: a program will shortly re-visit X Spatial locality: a program will shortly visit X+1

10 Exploit Parallelism Most operations do not depend on each other – hence, execute them in parallel At the circuit level, simultaneously access multiple ways of a set-associative cache At the organization level, execute multiple instructions at the same time At the system level, execute a different program while one is waiting on I/O

11 Factors Determining Cost Cost: amount spent by manufacturer to produce a finished good High volume  faster learning curve, increased manufacturing efficiency (10% lower cost if volume doubles), lower R&D cost per produced item Commodities: identical products sold by many vendors in large volumes (keyboards, DRAMs) – low cost because of high volume and competition among suppliers

12 Wafers and Dies An entire wafer is produced and chopped into dies that undergo testing and packaging

13 Integrated Circuit Cost Cost of an integrated circuit = (cost of die + cost of packaging and testing) / final test yield Cost of die = cost of wafer / (dies per wafer x die yield) Dies/wafer = wafer area / die area -  wafer diam / die diag Die yield = wafer yield x (1 + (defect rate x die area) /  ) -  Thus, die yield depends on die area and complexity arising from multiple manufacturing steps (  ~ 4.0)

14 Integrated Circuit Cost Examples A 30 cm diameter wafer cost $5-6K in 2001 Such a wafer yields about 366 good 1 cm 2 dies and 1014 good 0.49 cm 2 dies (note the effect of area and yield) Die sizes: Alpha cm 2, Itanium 3.0 cm 2, embedded processors are between 0.1 – 0.25 cm 2

15 Contribution of IC Costs to Total System Cost SubsystemFraction of total cost Cabinet: sheet metal, plastic, power supply, fans, cables, nuts, bolts, manuals, shipping box 6% Processor22% DRAM (128 MB)5% Video card5% Motherboard5% Processor board subtotal37% Keyboard and mouse3% Monitor19% Hard disk (20 GB)9% DVD drive6% I/O devices subtotal37% Software (OS + Office)20%

16 Cost and Price A $1000 increase in cost may result in a $3000 increase in price – hence, important to understand the relationship The relationship is complex – for example, a company may underprice a product that has heavy competition and overprice a product that has no competition

17 Computing Price Component costs: developing wafers, testing, packaging Direct costs: 10-30% of component costs: labor, warranty Gross margin (indirect costs): 10-45% of sum of these three (average selling price): R&D, marketing, sales, building rental, profits Retail mark-up: sum of all the above gives list price Low-end PCs may have low gross margins – low R&D, low cost for sales, low profits R&D costs are only 4-12%

18 Desktop Prices All systems have similar configurations – price variations due to expandability, expensive disks/memory/processor/OS, commoditization VendorModelProcessorClock speed MHz Price CompaqPresario 7000AMD Athlon1,400$2,091 DellPrecision 420Intel Pentium III1,000$3,834 DellPrecision 530Intel Pentium 41,700$4,175 HPWorkstation c3600PA $12,631 IBMRS P/170IBM III-2450$13,889 SunSunblade 100UltraSPARC II-e500$2,950 SunSunblade 1000UltraSPARC III750$9,950

19 Desktop Price-Performance (SPEC CPU2k)

20 Server Prices SystemCPUsPrice IBM xSeries 370 c/s280 Pentium III$15.5 M Compaq AlphaServer GS Alpha 21264$10.3 M Fujitsu PRIMEPOWER SPARC64 GP$9.7 M IBM pSeries S8524 IBM RS64-IV$7.5 M HP 9000 Enterprise Server48 HP PA-RISC 8600$8.5 M IBM iSeries iSeries400 Model 840$8.4 M Dell PowerEdge Pentium III$131 K IBM xSeries 250 c/s4 Pentium III$297 K Compaq Proliant ML5704 Pentium III$375 K HP NetServer LH Pentium III$373 K NEC Express 5800/1808 Pentium III$683 K HP 9000/L20004 PA-RISC 8500$368 K

21 Server Performance Using the OLTP benchmark TPC-C

22 Server Price-Performance

23 Embedded Prices Does not include the prices and power of support chips High variance in functionality, price, performance, power The IBM and AMD processors are used in network switches and laptops, the NEC VR 5432 is used in laser printers, the NEC VR 4122 is used in PDAs ProcessorIssue rateTypical power (mW)Price AMD Elan SC $38 AMD K6-2E $78 IBM PowerPC 750CX46000$94 NEC VR $25 NEC VR $33

24 Embedded Price-Performance-Power

25 Title Bullet