CS252/Kubiatowicz Lec 1.1 8/25/03 CS252 Graduate Computer Architecture Lecture 1 Review of Technology Trends and Cost/Performance August 25, 2003 Prof.

Slides:

Advertisements

Similar presentations

Slide 1Michael Flynn EE382 Winter/99 EE382 Processor Design Stanford University Winter Quarter Instructor: Michael Flynn Teaching Assistant:

Advertisements

OMSE 510: Computing Foundations 4: The CPU!

Slide 1 Fundamentals of Computer Design CSCE430/830 Computer Architecture Instructor: Hong Jiang Courtesy of Prof. Yifeng U. of Maine Fall, 2007.

1 CIS775: Computer Architecture Chapter 1: Fundamentals of Computer Design.

Computer Organization and Architecture

CpE442 Intro. To Computer Architecture CpE 442 Introduction To Computer Architecture Lecture 1 Instructor: H. H. Ammar These slides are based on the lecture.

Ch1. Fundamentals of Computer Design 3. Principles (5) ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department University of Massachusetts.

CIS629 Fall Lecture Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two important.

ENGS 116 Lecture 11 ENGS 116 / COSC 107 Computer Architecture Introduction Vincent H. Berk September 21, 2005 Reading for Friday: Chapter 1.1 – 1.4, Amdahl.

EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.

CS / Schlesinger Lec1.1 1/20/99©UCB Spring 1999 Computer Architecture Lecture 1 Introduction and Five Components of a Computer Spring, 1999 Arie Schlesinger.

1 CSE SUNY New Paltz Chapter 1 Introduction CSE-45432Introduction to Computer Architecture Dr. Izadi.

Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.

CIS429/529 Winter 07 - Performance - 1 Performance Overview Execution time is the best measure of performance: simple, intuitive, straightforward. Two.

ECE 232 L1 Intro.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 1 Introduction.

CPU Performance Assessment As-Bahiya Abu-Samra *Moore’s Law *Clock Speed *Instruction Execution Rate - MIPS - MFLOPS *SPEC Speed Metric *Amdahl’s.

Computer performance.

CENG311 Computer Architecture Kayhan Erciyes. CS231 Assembly language and Digital Circuits Instructor:Kayhan Erciyes Office:

Digital Systems Design L01 Introduction.1 Digital Systems Design Lecture 01: Introduction Adapted from: Mary Jane Irwin ( )

Lecture 1: Course Introduction, Technology Trends, Performance Professor Alvin R. Lebeck Computer Science 220 Fall 2001.

ECE 4436ECE 5367 Introduction to Computer Architecture and Design Ji Chen Section : T TH 1:00PM – 2:30PM Prerequisites: ECE 4436.

Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.

Lecture 2: Computer Performance

Cs 152 L1 Intro.1 Patterson Fall 97 ©UCB What is “Computer Architecture” Computer Architecture = Instruction Set Architecture + Machine Organization.

Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

PerformanceCS510 Computer ArchitecturesLecture Lecture 3 Benchmarks and Performance Metrics Lecture 3 Benchmarks and Performance Metrics.

Integrated Circuits Costs

Computer Organization and Design Computer Abstractions and Technology

Computer Architecture Mehran Rezaei

Digital System Architecture 1 28 ต.ค ต.ค ต.ค ต.ค ต.ค. 58 Lecture 2a Computer Performance and Cost Pradondet Nilagupta.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

CEN 316 Computer Organization and Design Assessing and Understanding Performance Mansour AL Zuair.

Computer Architecture CPSC 350

CS252/Patterson Lec 1.1 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic2: Technology Trend and Cost/Performance (Adapted from David A. Patterson’s CS252 lecture.

EEL5708/Bölöni Lec 1.1 August 21, 2006 Lotzi Bölöni Fall 2006 EEL 5708 High Performance Computer Architecture Lecture 1 Introduction.

Cost and Performance.

Pipelining and Parallelism Mark Staveley

1 chapter 1 Computer Architecture and Design ECE4480/5480 Computer Architecture and Design Department of Electrical and Computer Engineering University.

Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.

Performance Performance

EEL5708/Bölöni Lec 2.1 Fall 2004 August 27, 2004 Lotzi Bölöni Fall 2004 EEL 5708 High Performance Computer Architecture Lecture 2 Introduction: the big.

DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.

Lec2.1 Computer Architecture Chapter 2 The Role of Performance.

Compsci Today’s topics l Operating Systems  Brookshear, Chapter 3  Great Ideas, Chapter 10  Slides from Kevin Wayne’s COS 126 course l Performance.

CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.

EEL5708/Bölöni Lec 3.1 Fall 2004 Sept 1, 2004 Lotzi Bölöni Fall 2004 EEL 5708 High Performance Computer Architecture Lecture 3 Review: Instruction Sets.

Lecture 1: Introduction CprE 585 Advanced Computer Architecture, Fall 2004 Zhao Zhang.

VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.

CS4100: 計算機結構 Course Outline 國立清華大學資訊工程學系九十九年度第二學期.

William Stallings Computer Organization and Architecture 6th Edition

Review: Instruction Set Evolution

How do we evaluate computer architectures?

Graduate Computer Architecture Lecture 1 Review of Technology Trends and Cost/Performance Ayman Alharbi.

Morgan Kaufmann Publishers

Architecture & Organization 1

Computer Architecture CSCE 350

CS252 Graduate Computer Architecture Lecture 1 Review of Technology Trends and Cost/Performance August 25, 2003 Prof. John Kubiatowicz

CS775: Computer Architecture

Computer Architecture

Architecture & Organization 1

BIC 10503: COMPUTER ARCHITECTURE

T Computer Architecture, Autumn 2005

COMS 361 Computer Organization

August 30, 2000 Prof. John Kubiatowicz

A Question to Ponder On [from last lecture]

Presentation transcript:

CS252/Kubiatowicz Lec 1.1 8/25/03 CS252 Graduate Computer Architecture Lecture 1 Review of Technology Trends and Cost/Performance August 25, 2003 Prof. John Kubiatowicz

CS252/Kubiatowicz Lec 1.2 8/25/03 Original Big Fishes Eating Little Fishes

CS252/Kubiatowicz Lec 1.3 8/25/ Computer Food Chain PCWork- station Mini- computer Mainframe Mini- supercomputer Supercomputer Massively Parallel Processors

CS252/Kubiatowicz Lec 1.4 8/25/ Computer Food Chain PCWork- station Mainframe Supercomputer Mini- supercomputer Massively Parallel Processors Mini- computer Now who is eating whom? Server

CS252/Kubiatowicz Lec 1.5 8/25/03 Why Such Change in 10 years? Performance –Technology Advances »CMOS VLSI dominates older technologies (TTL, ECL) in cost AND performance –Computer architecture advances improves low-end »RISC, superscalar, RAID, … Price: Lower costs due to … –Simpler development »CMOS VLSI: smaller systems, fewer components –Higher volumes »CMOS VLSI : same dev. cost 10,000 vs. 10,000,000 units –Lower margins by class of computer, due to fewer services Function –Rise of networking/local interconnection technology

CS252/Kubiatowicz Lec 1.6 8/25/03 Amazing Underlying Technology Change “Cramming More Components onto Integrated Circuits” –Gordon Moore, Electronics, 1965

CS252/Kubiatowicz Lec 1.7 8/25/03 Technology Trends: Microprocessor Capacity CMOS improvements: Die size: 2X every 3 yrs Line width: halve / 7 yrs Pentium 4: 55 million Alpha 21264: 15 million Pentium Pro: 5.5 million PowerPC 620: 6.9 million Alpha 21164: 9.3 million Sparc Ultra: 5.2 million Moore’s Law

CS252/Kubiatowicz Lec 1.8 8/25/03 Memory Capacity (Single Chip DRAM) year size(Mb)cyc time ns ns ns ns ns ns ns ns

CS252/Kubiatowicz Lec 1.9 8/25/03 Technology  dramatic change Processor –logic capacity: about 30% per year –clock rate: about 20% per year Memory –DRAM capacity: about 60% per year (4x every 3 years) –Memory speed: about 10% per year –Cost per bit: improves about 25% per year Disk –capacity: about 60% per year –Total use of data: 100% per 9 months! Network Bandwidth –Bandwidth increasing more than 100% per year!

CS252/Kubiatowicz Lec /25/03 Computers in the News: New IBM Transistor Announced 12/10/02 6nm gate length!!! Details: Still to be announced

CS252/Kubiatowicz Lec /25/03 Processor Performance Trends Microprocessors Minicomputers Mainframes Supercomputers Year

CS252/Kubiatowicz Lec /25/03 Processor Performance (1.35X before, 1.55X now) 1.54X/yr

CS252/Kubiatowicz Lec /25/03 Computer Architecture Is … the attributes of a [computing] system as seen by the programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flows and controls the logic design, and the physical implementation. Amdahl, Blaaw, and Brooks, 1964 SOFTWARE

CS252/Kubiatowicz Lec /25/03 Computer Architecture’s Changing Definition 1950s to 1960s: Computer Architecture Course: Computer Arithmetic 1970s to mid 1980s: Computer Architecture Course: Instruction Set Design, especially ISA appropriate for compilers 1990s: Computer Architecture Course: Design of CPU, memory system, I/O system, Multiprocessors, Networks 2010s: Computer Architecture Course: Self adapting systems? Self organizing structures? DNA Systems/Quantum Computing?

CS252/Kubiatowicz Lec /25/03 Instruction Set Architecture (ISA) instruction set software hardware

CS252/Kubiatowicz Lec /25/03 Evolution of Instruction Sets Single Accumulator (EDSAC 1950) Accumulator + Index Registers (Manchester Mark I, IBM 700 series 1953) Separation of Programming Model from Implementation High-level Language BasedConcept of a Family (B )(IBM ) General Purpose Register Machines Complex Instruction SetsLoad/Store Architecture RISC (Vax, Intel ) (CDC 6600, Cray ) (Mips,Sparc,HP-PA,IBM RS6000, )

CS252/Kubiatowicz Lec /25/03 Interface Design A good interface: Lasts through many implementations (portability, compatibility) Is used in many differeny ways (generality) Provides convenient functionality to higher levels Permits an efficient implementation at lower levels Interface imp 1 imp 2 imp 3 use time

CS252/Kubiatowicz Lec /25/03 Virtualization: One of the lessons of RISC Integrated Systems Approach –What really matters is the functioning of the complete system, I.e. hardware, runtime system, compiler, and operating system –In networking, this is called the “End to End argument” –Programmers care about high-level languages, debuggers, source- level object-oriented programming Computer architecture is not just about transistors, individual instructions, or particular implementations Original RISC projects replaced complex instructions with a compiler + simple instructions Logical Extension => Genetically adaptive runtime systems enhanced by dynamic compilation running on reconfigurable hardware? Perhaps.

CS252/Kubiatowicz Lec /25/03 Computer Architecture Topics Instruction Set Architecture Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, Dynamic Compilation Addressing, Protection, Exception Handling L1 Cache L2 Cache DRAM Disks, WORM, Tape Coherence, Bandwidth, Latency Emerging Technologies Interleaving Bus protocols RAID VLSI Input/Output and Storage Memory Hierarchy Pipelining and Instruction Level Parallelism Network Communication Other Processors

CS252/Kubiatowicz Lec /25/03 Sample Organization: It’s all about communication Proc Caches Busses Memory I/O Devices: Controllers adapters Disks Displays Keyboards Networks Pentium III Chipset

CS252/Kubiatowicz Lec /25/03 Computer Architecture Topics M Interconnection Network S PMPMPMP ° ° ° Topologies, Routing, Bandwidth, Latency, Reliability Network Interfaces Shared Memory, Message Passing, Data Parallelism Processor-Memory-Switch Multiprocessors Networks and Interconnections

CS252/Kubiatowicz Lec /25/03 CS 252 Course Focus Understanding the design techniques, machine structures, technology factors, evaluation methods that will determine the form of computers in 21st Century Technology Programming Languages Operating Systems History Applications Interface Design (ISA) Measurement & Evaluation Parallelism Computer Architecture: Instruction Set Design Organization Hardware/Software Boundary Compilers

CS252/Kubiatowicz Lec /25/03 Topic Coverage Textbook: Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 3 rd Ed., Research Papers -- Handed out in class 1.5 weeks Review: Fundamentals of Computer Architecture (Ch. 1), Instruction Set Architecture (Ch. 2), Pipelining (Ch. 3) 2.5 weeks: Pipelining, Interrupts, and Instructional Level Parallelism (Ch. 4), Vector Processors (Appendix B). 1.5 weeks:Dynamic Compilation. Data Speculation (papers). Complexity, design via genetic algorithms 1 week: Memory Hierarchy (Chapter 5) 1.5 weeks: Fault Tolerance, Input/Output and Storage (Ch. 6) 1.5 weeks: Networks and Interconnection Technology (Ch. 7) 1.5 weeks: Multiprocessors (Ch. 8 + Research papers + Culler book draft Chapter 1) 1 week:Quantum Computing, DNA Computing

CS252/Kubiatowicz Lec /25/03 CS252: Information Instructor:Prof John D. Kubiatowicz Office: 673 Soda Hall, Office Hours: Wed 3:30 - 5:00 or by appt. (Contact Veronique Richards, , 676 Soda) T. A:TBA Class:Mon/Wed, 1:00 - 2:30pm 310 Soda Hall Text:Computer Architecture: A Quantitative Approach, Third Edition (2002) Web page: Lectures available online <11:30AM day of lecture Newsgroup: ucb.class.cs252

CS252/Kubiatowicz Lec /25/03 Lecture style 1-Minute Review 20-Minute Lecture/Discussion 5- Minute Administrative Matters 25-Minute Lecture/Discussion 5-Minute Break (water, stretch) 25-Minute Lecture/Discussion Instructor will come to class early & stay after to answer questions Attention Time 20 min.Break“In Conclusion,...”

CS252/Kubiatowicz Lec /25/03 Grading 10% Homeworks (work in pairs) 40% Examinations (2 Midterms) 40% Research Project (work in pairs) –Transition from undergrad to grad student –Berkeley wants you to succeed, but you need to show initiative –pick topic –meet 3 times with faculty/TA to see progress –give oral presentation –give poster session –written report like conference paper –3 weeks work full time for 2 people –Opportunity to do “research in the small” to help make transition from good student to research colleague 10% Class Participation

CS252/Kubiatowicz Lec /25/03 Quizes Reduce the pressure of taking quizes –Only 2 Graded Quizes: Tentative: Wed Oct 13 th and Wed Dec 1 st –Our goal: test knowledge vs. speed writing –3 hrs to take 1.5-hr test (5:30-8:30 PM, TBA location) –Both mid-term quizes can bring summary sheet »Transfer ideas from book to paper –Last chance Q&A: during class time day of exam Students/Staff meet over free pizza/drinks at La Vals: Wed Oct 13 th (8:30 PM) and Wed Dec 1 st (8:30 PM)

CS252/Kubiatowicz Lec /25/03 Research Paper Reading As graduate students, you are now researchers. Most information of importance to you will be in research papers. Ability to rapidly scan and understand research papers is key to your success. So: you will read lots of papers in this course! –Quick 1 paragraph summaries will be due in class –Important supplement to book. –Will discuss papers in class Papers will be scanned and on web page.

CS252/Kubiatowicz Lec /25/03 More Course Info Everything is on the course Web page: Notes: –Not sure what the state of textbooks at Student Center. –The course Web page includes a pointer to last term’s 152 home page. The “handout” page includes pointers to old 152 quizes. Schedule: –2 Graded Quizes: Mon Oct 13 th and Mon Dec 1 st –Veteran’s Day: Friday Nov 5 th –Thanksgiving Vacation: Thur Nov 27 th - Sun Nov 28 th –Oral Presentations: Tue/Wed Dec 9/10 th –252 Last lecture: Fri Dec 3 rd –252 Poster Session: ??? –Project Papers/URLs due: Fri Dec 12 th Project Suggestions: TBA

CS252/Kubiatowicz Lec /25/03 Related Courses CS 152 CS 252 CS 258 CS 250 How to build it Implementation details Why, Analysis, Evaluation Parallel Architectures, Languages, Systems Integrated Circuit Technology from a computer-organization viewpoint Strong Prerequisite Basic knowledge of the organization of a computer is assumed!

CS252/Kubiatowicz Lec /25/03 Coping with CS 252 Too many students with too varied background? –Next Wednesday - Prequisite exam Limiting Number of Students –First priority is CS/ EECS grad students taking prelims –Second priority is N-th year CS/ EECS grad students (breadth) –Third priority is College of Engineering grad students –Fourth priority is CS/EECS undergraduate seniors (Note: 1 graduate course unit = 2 undergraduate course units) –All other categories If not this semester, 252 is offered regularly

CS252/Kubiatowicz Lec /25/03 Coping with CS 252 Students with too varied background? –In past, CS grad students took written prelim exams on undergraduate material in hardware, software, and theory –1st 5 weeks reviewed background, helped 252, 262, 270 –Prelims were dropped => some unprepared for CS 252? In class exam on Wednesday September 3 rd –Doesn’t affect grade, only admission into class –2 grades: Admitted or audit/take CS 152 1st –Improve your experience if recapture common background Review: Chapters 1-3, CS 152 home page, maybe “Computer Organization and Design (COD)2/e” –Chapters 1 to 8 of COD if never took prerequisite –If took a class, be sure COD Chapters 2, 6, 7 are familiar –Copies in Bechtel Library on 2-hour reserve –Last exam on previous-year’s web site (~kubitron/courses/cs252-F00)

CS252/Kubiatowicz Lec /25/03 Building Hardware that Computes

CS252/Kubiatowicz Lec /25/03 Finite State Machines: System state is explicit in representation Transitions between states represented as arrows with inputs on arcs. Output may be either part of state or on arcs Alpha/ 0 Delta/ 2 Beta/ “Mod 3 Machine” Input (MSB first) Mod

CS252/Kubiatowicz Lec /25/03 “Mealey Machine” “Moore Machine” Implementation as Combinational logic + Latch Alpha/ 0 Delta/ 2 Beta/ 1 0/0 1/0 1/1 0/1 0/0 1/1 Latch Combinational Logic

CS252/Kubiatowicz Lec /25/03 Microprogrammed Controllers State machine in which part of state is a “micro-pc”. –Explicit circuitry for incrementing or changing PC Includes a ROM with “microinstructions”. –Controlled logic implements at least branches and jumps ROM (Instructions) Addr Branch PC + 1 MUX Next Address Control 0: forw 35 xxx 1: b_no_obstacles 000 2: back 10 xxx 3: rotate 90 xxx 4: goto 001 InstructionBranch Combinational Logic/ Controlled Machine State w/ Address

CS252/Kubiatowicz Lec /25/03 Execution Cycle Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand data Compute result value or status Deposit results in storage for later use Determine successor instruction

CS252/Kubiatowicz Lec /25/03 What’s a Clock Cycle? Old days: 10 levels of gates Today: determined by numerous time-of- flight issues + gate delays –clock propagation, wire lengths, drivers Latch or register combinational logic

CS252/Kubiatowicz Lec /25/03 Pipelined Instruction Interpretation Instruction Register Operand Registers Instruction Address Result Registers Next Instruction Instruction Fetch Decode & Operand Fetch Execute Store Results NI IF D E W NI IF D E W NI IF D E W NI IF D E W NI IF D E W Time Registers or Mem

CS252/Kubiatowicz Lec /25/03 Sequential Laundry Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? ABCD PM Midnight TaskOrderTaskOrder Time

CS252/Kubiatowicz Lec /25/03 Pipelined Laundry Start work ASAP Pipelined laundry takes 3.5 hours for 4 loads ABCD 6 PM Midnight TaskOrderTaskOrder Time

CS252/Kubiatowicz Lec /25/03 Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup ABCD 6 PM 789 TaskOrderTaskOrder Time

CS252/Kubiatowicz Lec /25/03 The Process of Design Architecture is an iterative process: Searching the space of possible designs At all levels of computer systems Creativity Good Ideas Mediocre Ideas Bad Ideas Cost / Performance Analysis

CS252/Kubiatowicz Lec /25/03 Measurement Tools Benchmarks, Traces, Mixes Hardware: Cost, delay, area, power estimation Simulation (many levels) –ISA, RT, Gate, Circuit Queuing Theory Rules of Thumb Fundamental “Laws”/Principles

CS252/Kubiatowicz Lec /25/03 The Bottom Line: Performance (and Cost) Time to run the task (ExTime) –Execution time, response time, latency Tasks per day, hour, week, sec, ns … (Performance) –Throughput, bandwidth Plane Boeing 747 BAD/Sud Concodre Speed 610 mph 1350 mph DC to Paris 6.5 hours 3 hours Passengers Throughput (pmph) 286, ,200

CS252/Kubiatowicz Lec /25/03 Performance(X) Execution_time(Y) n == Performance(Y) Execution_time(Y) Definitions Performance is in units of things per sec –bigger is better If we are primarily concerned with response time –performance(x) = 1 execution_time(x) " X is n times faster than Y" means

CS252/Kubiatowicz Lec /25/03 Amdahl’s Law Best you could ever hope to do:

CS252/Kubiatowicz Lec /25/03 Metrics of Performance Compiler Programming Language Application Datapath Control TransistorsWiresPins ISA Function Units (millions) of Instructions per second: MIPS (millions) of (FP) operations per second: MFLOP/s Cycles per second (clock rate) Megabytes per second Answers per month Operations per second

CS252/Kubiatowicz Lec /25/03 Computer Performance CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle Inst Count CPIClock Rate Program X Compiler X (X) Inst. Set. X X Organization X X Technology X inst count CPI Cycle time

CS252/Kubiatowicz Lec /25/03 Cycles Per Instruction (Throughput) “Instruction Frequency” CPI = (CPU Time * Clock Rate) / Instruction Count = Cycles / Instruction Count “Average Cycles per Instruction”

CS252/Kubiatowicz Lec /25/03 Example: Calculating CPI bottom up Typical Mix of instruction types in program Base Machine (Reg / Reg) OpFreqCyclesCPI(i)(% Time) ALU50%1.5(33%) Load20%2.4(27%) Store10%2.2(13%) Branch20%2.4(27%) 1.5

CS252/Kubiatowicz Lec /25/03 Example: Branch Stall Impact Assume CPI = 1.0 ignoring branches (ideal) Assume solution was stalling for 3 cycles If 30% branch, Stall 3 cycles on 30% OpFreqCyclesCPI(i)(% Time) Other 70%1.7(37%) Branch30%4 1.2(63%)  new CPI = 1.9 New machine is 1/1.9 = 0.52 times faster (i.e. slow!)

CS252/Kubiatowicz Lec /25/03 Speed Up Equation for Pipelining For simple RISC pipeline, CPI = 1:

CS252/Kubiatowicz Lec /25/03 SPEC: System Performance Evaluation Cooperative First Round 1989 –10 programs yielding a single number (“SPECmarks”) Second Round 1992 –SPECInt92 (6 integer programs) and SPECfp92 (14 floating point programs) »Compiler Flags unlimited. March 93 of DEC 4000 Model 610: spice: unix.c:/def=(sysv,has_bcopy,”bcopy(a,b,c)= memcpy(b,a,c)” wave5: /ali=(all,dcom=nat)/ag=a/ur=4/ur=200 nasa7: /norecu/ag=a/ur=4/ur2=200/lc=blas Third Round 1995 –new set of programs: SPECint95 (8 integer programs) and SPECfp95 (10 floating point) –“benchmarks useful for 3 years” –Single flag setting for all programs: SPECint_base95, SPECfp_base95 Fourth Round 2000: 26 apps –analysis and simulation programs –Compression: bzip2, gzip, –Integrated circuit layout, ray tracing, lots of others

CS252/Kubiatowicz Lec /25/03 How to Summarize Performance Arithmetic mean (weighted arithmetic mean) tracks execution time:  (T i )/n or  (W i *T i ) Harmonic mean (weighted harmonic mean) of rates (e.g., MFLOPS) tracks execution time: n/  (1/R i ) or n/  (W i /R i ) Normalized execution time is handy for scaling performance (e.g., X times faster than SPARCstation 10) But do not take the arithmetic mean of normalized execution time, use the geometric mean: (  T j / N j ) 1/n

CS252/Kubiatowicz Lec /25/03 SPEC First Round One program: 99% of time in single line of code New front-end compiler could improve dramatically

CS252/Kubiatowicz Lec /25/03 Performance Evaluation “For better or worse, benchmarks shape a field” Good products created when have: –Good benchmarks –Good ways to summarize performance Given sales is a function in part of performance relative to competition, investment in improving product as reported by performance summary If benchmarks/summary inadequate, then choose between improving product for real programs vs. improving product to get more sales; Sales almost always wins! Execution time is the measure of computer performance!

CS252/Kubiatowicz Lec /25/03 Integrated Circuits Costs Die Cost goes roughly with die area 4

CS252/Kubiatowicz Lec /25/03 Real World Examples ChipMetalLine WaferDefectAreaDies/YieldDie Cost layers width cost /cm 2 mm 2 wafer 386DX20.90$ %$4 486DX230.80$ %$12 PowerPC $ %$53 HP PA $ %$73 DEC Alpha30.70$ %$149 SuperSPARC30.70$ %$272 Pentium30.80$ %$417 – From "Estimating IC Manufacturing Costs,” by Linley Gwennap, Microprocessor Report, August 2, 1993, p. 15

CS252/Kubiatowicz Lec /25/03 Summary, #1 Designing to Last through Trends CapacitySpeed Logic2x in 3 years2x in 3 years SPEC RATING:2x in 1.5 years DRAM4x in 3 years2x in 10 years Disk4x in 3 years2x in 10 years 6yrs to graduate => 16X CPU speed, DRAM/Disk size Time to run the task –Execution time, response time, latency Tasks per day, hour, week, sec, ns, … –Throughput, bandwidth “X is n times faster than Y” means ExTime(Y) Performance(X) = ExTime(X)Performance(Y)

CS252/Kubiatowicz Lec /25/03 Summary, #2 Amdahl’s Law: CPI Law: Execution time is the REAL measure of computer performance! Good products created when have: –Good benchmarks, good ways to summarize performance Die Cost goes roughly with die area 4 Speedup overall = ExTime old ExTime new = 1 (1 - Fraction enhanced ) + Fraction enhanced Speedup enhanced CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle CPU time= Seconds = Instructions x Cycles x Seconds Program Program Instruction Cycle