ARCHITECTURE PERFORMANCE EVALUATION Matthew Jacob SERC, Indian Institute of Science, Bangalore.

Slides:



Advertisements
Similar presentations
Branch prediction Titov Alexander MDSP November, 2009.
Advertisements

Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
TU/e Processor Design 5Z032 1 Processor Design 5Z032 The role of Performance Henk Corporaal Eindhoven University of Technology 2009.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 4:
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Patterson.
Chapter 1 CSF 2009 Computer Performance. Defining Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology.
EECC550 - Shaaban #1 Lec # 3 Spring Computer Performance Evaluation: Cycles Per Instruction (CPI) Most computers run synchronously utilizing.
1 Recap. 2 Measuring Performance  A computer user: response time (execution time).  A computer center manager - throughput - the total amount of work.
VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.
Computer ArchitectureFall 2007 © September 17, 2007 Karem Sakallah CS-447– Computer Architecture.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Simulation.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Chapter 4 Assessing and Understanding Performance
Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin- Madison.
1 Chapter 4. 2 Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Chapter 12: Simulation and Modeling Invitation to Computer Science, Java Version, Third Edition.
Benchmarks Programs specifically chosen to measure performance Must reflect typical workload of the user Benchmark types Real applications Small benchmarks.
1 Computer Performance: Metrics, Measurement, & Evaluation.
Where Has This Performance Improvement Come From? Technology –More transistors per chip –Faster logic Machine Organization/Implementation –Deeper pipelines.
Benchmarks Prepared By : Arafat El-madhoun Supervised By:eng. Mohammad temraz.
1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.
Computer Performance Computer Engineering Department.
Accurate Analytical Modeling of Superscalar Processors J. E. Smith Tejas Karkhanis.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Lecture 2 Process Concepts, Performance Measures and Evaluation Techniques.
CDA 3101 Fall 2013 Introduction to Computer Organization Benchmarks 30 August 2013.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
1 CS/EE 362 Hardware Fundamentals Lecture 9 (Chapter 2: Hennessy and Patterson) Winter Quarter 1998 Chris Myers.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
Computer Architecture
Modeling in Computer Architecture Matthew Jacob. Architecture Evaluation Challenges Skadron, Martonosi, August, Hill, Lilja and Pai, IEEE Computer, Aug.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Performance Performance
6.1 Advanced Operating Systems Lies, Damn Lies and Benchmarks Are your benchmark tests reliable?
September 10 Performance Read 3.1 through 3.4 for Wednesday Only 3 classes before 1 st Exam!
Dr. Anis Koubâa CS433 Modeling and Simulation
Lec2.1 Computer Architecture Chapter 2 The Role of Performance.
Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
ECE/CS 552: Benchmarks, Means and Amdahl’s Law © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi,
CMSC 611: Advanced Computer Architecture Performance & Benchmarks Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some.
Jan. 5, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 2: Performance Evaluation and Benchmarking * Jeremy R. Johnson Wed. Oct. 4,
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
Simulation. Types of simulation Discrete-event simulation – Used for modeling of a system as it evolves over time by a representation in which the state.
BITS Pilani, Pilani Campus Today’s Agenda Role of Performance.
PipeliningPipelining Computer Architecture (Fall 2006)
Modelling & Simulation of Semiconductor Devices Lecture 1 & 2 Introduction to Modelling & Simulation.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Measuring Performance and Benchmarks Instructor: Dr. Mike Turi Department of Computer Science and Computer Engineering Pacific Lutheran University Lecture.
Computer Architecture & Operations I
Measuring Performance II and Logic Design
OPERATING SYSTEMS CS 3502 Fall 2017
Computer Architecture & Operations I
CS161 – Design and Architecture of Computer Systems
September 2 Performance Read 3.1 through 3.4 for Tuesday
Assessing and Understanding Performance
CSCE 212 Chapter 4: Assessing and Understanding Performance
Invitation to Computer Science 5th Edition
Performance ICS 233 Computer Architecture and Assembly Language
CS170 Computer Organization and Architecture I
COMS 361 Computer Organization
Benchmarks Programs specifically chosen to measure performance
CS161 – Design and Architecture of Computer Systems
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

ARCHITECTURE PERFORMANCE EVALUATION Matthew Jacob SERC, Indian Institute of Science, Bangalore

© MJT, IISc 2 Architecture Performance Evaluation 1. Introduction: Modeling, Simulation 2. Benchmark programs and suites 3. Fast simulation techniques 4. Analytical modeling

© MJT, IISc 3 Evaluating Computer Systems: When? Designer: During design Administrator: Before purchase Administrator: While tuning/configuring User: In deciding which system to use System not available System available

© MJT, IISc 4 Performance Evaluation 1. Performance measurement 2. Performance modeling

© MJT, IISc 5 Performance Evaluation. 1. Performance measurement  Time, space, power …  Using hardware or software probes  Example: Pentium hardware performance counters 2. Performance modeling

© MJT, IISc 6 Performance Evaluation.. 1. Performance measurement  Time, space, power …  Using hardware or software probes  Example: Pentium hardware performance counters 2. Performance modeling  Model  Representation of the system under study  A simplifying set of assumptions about how it behaves Interacts with the outside world Changes with time through the interactions between its own components

© MJT, IISc 7 Performance Evaluation…. 1. Performance measurement  Time, space, power …  Using hardware or software probes  Example: Pentium hardware performance counters 2. Performance modeling  Kinds of Models 1. Physical or scale model 2. Analytical model: Using mathematical equations 3. Simulation model: Computer based approach; using a computer program to mimic behaviour of system We will first look at Simulation, then at Analytical Modeling

© MJT, IISc 8 Simulation Imitation of some real thing, state of affairs, or process (Wikipedia) Using a system model instead of the actual physical system The act of simulating something generally entails representing certain key characteristics or behaviours of a selected physical or abstract system State of the system

© MJT, IISc 9 State State of a system  at a moment in time  a function of the values of the attributes of the objects that comprise the system Example: Consider a coffee shop, where there is a cashier and a coffee dispenser  State can be described by (Number of customers at Cashier, Number of Customers at Coffee dispenser)

© MJT, IISc 10 State Transition Diagram Change of state occurs due to 2 kinds of events  Arrival or Departure of a customer  Can label each state transition arc as A or D (0,0) (1,0) (2,0) (3,0) (0,1) (1,1) New Customer Arrives Customer Departs from System

© MJT, IISc 11 Event An incident or situation which occurs in a particular place during a particular interval of time  Example: Cashier is busy between times t 1 and t 2 time 0t1t1 t2t2

© MJT, IISc 12 Discrete Event An incident or situation which occurs at a particular instant in time  Example: Cashier becomes busy at time t 1  System state only changes instantaneously at such moments in time Discrete Event System Model  States  Discrete events and corresponding state changes time 0t1t1

© MJT, IISc 13 Discrete Event Simulation Involves keeping track of 1. System state 2. Pending events Each event has an associated time (Event type, Time) 3. Simulated time (Simulation Clock)

© MJT, IISc 14 The DES Algorithm Variables: SystemState, SimnClock, PendingEventList Initialize variables Insert first event into PendingEventList while (not done) { Delete event E with lowest time t from PendingEventList Advance SimnClock to that time t Update SystemState by calling event handler of event E }

© MJT, IISc 15 Example: Cashier at Coffee Shop Events? State? Event Handlers?

© MJT, IISc 16 Example: Cashier at Coffee Shop Events?  Arrival of customer, Departure of customer State?  boolean CashierBusy?  queue CashQueue Info in each queue item: arrival time of that customer Operations: EnQueue, DeQueue, IsEmpty  Keeping track of properties of interest e.g., Cashier utilization, Average wait time in cash queue Event Handlers?

© MJT, IISc 17 Example: Handler for Arrival (time t) if (CashierBusy?){ EnQueue(CashQueue, t ) } else { CashierBusy? = TRUE TimeCashierBecameBusy = t NumThroughQueue++ ScheduleEvent(D, t + SERVICETIME) }

© MJT, IISc 18 Example: Handler Departure (time t) if (IsEmpty(CashQueue)){ CashierBusy? = FALSE TotalCashierBusyTime += (t – TimeCashierBecameBusy) } else { next = DeQueue(CashQueue) NumThroughQueue++ TotalTimeInQueue += (t – next.arrivaltime) ScheduleEvent(D, t + SERVICETIME) }

© MJT, IISc 19 The DES Algorithm Variables: SystemState, SimnClock, PendingEventList Initialize variables Insert first event into PendingEventList while (not done) { Delete event E with lowest time t from PendingEventList Advance SimnClock to that time t Update SystemState by calling event handler of event E }

© MJT, IISc 20 Architectural Simulation Example: Simulation of memory system behaviour during execution of a given program  Objective: Average memory access time, Number of cache hits, etc. At least 3 different ways to do this

© MJT, IISc 21 Architectural Simulation. 1. Trace Driven Simulation 2. Stochastic Simulation 3. Execution Driven Simulation

© MJT, IISc 22 Architectural Simulation.. 1. Trace Driven Simulation  Trace: A log or record of all the relevant events that must be simulated Example: (R, 0x1279E, 1B), (R, 0xAB7800, 4B),…

© MJT, IISc 23 Architectural Simulation… 1. Trace Driven Simulation  Trace: A log or record of all the relevant events that must be simulated Example: (R, 0x1279E, 1B), (R, 0xAB7800, 4B),… 2. Stochastic Simulation  Driven by random number generators Example: Addresses are uniformly distributed between 0 and ; 45% of memory operations are Reads

© MJT, IISc 24 Architectural Simulation…. 1. Trace Driven Simulation  Trace: A log or record of all the relevant events that must be simulated Example: (R, 0x1279E, 1B), (R, 0xAB7800, 4B),… 2. Stochastic Simulation  Driven by random number generators Example: Addresses are uniformly distributed between 0 and ; 45% of memory operations are Reads 3. Execution Driven Simulation  Where you interleave the execution of the program (whose execution is being simulated) with the simulation of the target architecture

© MJT, IISc 25 Example: SimpleScalar A widely used execution driven architecture simulator ( Tool set: compiler, assembler, linker, simulation and visualization tools Facilitates simulation of real programs on a range of modern processors  Fast functional simulator  Detailed out-of-order issue processor with non- blocking caches, speculative execution, branch prediction, etc. How fast are they? 10 MIPS 1 MIPS

© MJT, IISc 26 SimpleScalar. From Austin, Larsen, Ernst, IEEE Computer, Feb 2002 Program whose execution is being simulated Emulates execution of the instructions of the program System calls are executed on the host system where the simulation is running Interleaved with updating architectural state and statistics (e.g., P4 Linux) (MIPS)

© MJT, IISc 27 What programs are used? Performance can vary substantially from program to program To compare architectural alternatives, it would be good if a standard set of programs was used This has led to some degree of consensus on what programs to use in architectural studies Benchmark programs

© MJT, IISc 28 Kinds of Benchmark Programs 1.Toy Benchmarks  Factorial, Quicksort, Hanoi, Ackerman, Sieve 2.Synthetic Benchmarks  Dhrystone, Whetstone 3.Benchmark Kernels  DAXPY, Livermore loops 4.Benchmark Suites  SPEC benchmarks

© MJT, IISc 29 Synthetic Benchmarks: Whetstone Created in Whetstone Lab, UK, 1970s Synthetic, originally in Algol 60 Floating point, math libraries Synthetic Benchmarks: Dhrystone Pun on Whetstone; Weicker (1984) Integer performance “Typical" application mix of mathematical and other operations (string handling)

© MJT, IISc 30 Kernel Benchmarks: Livermore Loops Fortran DO loops extracted from frequently used programs at Lawrence Livermore National Labs, USA To assess floating point arithmetic performance 1.Hydro fragment DO 1 L = 1, Loop DO 1 k = 1, n * 1 X(k) = Q + Y(k) * (R * ZX(k+10) + T * ZX(k+11)) 2.ICCG excerpt (Incomplete Cholesky Conjugate Gradient) 3.Inner product 4.Banded linear equations 5.Tri-diagonal elimination, below diagonal 6.General linear recurrence equations

© MJT, IISc 31 SPEC Benchmark Suites Standard Performance Evaluation Corporation  `Non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers’  `Develops suites of benchmarks and also reviews and publishes submitted results from member organizations and other benchmark licensees

© MJT, IISc 32 SPEC Consortium Members Acer Inc, Action S.A., AMD, Apple Inc, Azul Systems, Inc, BEA Systems, BlueArc, Bull S.A., Citrix Online, CommuniGate Systems, Dell, EMC, Fujitsu Limited, Fujitsu Siemens, Hewlett-Packard, Hitachi Data Systems, Hitachi Ltd., IBM, Intel, ION Computer Systems, Itautec S/A, Microsoft, NEC – Japan, NetEffect, Network Appliance. NVIDIA, Openwave Systems, Oracle, Panasas, Pathscale, Principled Technologies, QLogic Corporation, The Portland Group, Rackable Systems, Red Hat, SAP AG, Scali, SGI, Sun Microsystems, Super Micro Computer, Inc., SWsoft, Symantec Corporation, Trigence, Unisys

© MJT, IISc 33 SPEC Benchmark Suites … 1.CPU 2.Enterprise Services 3.Graphics/Applications 4.High Performance Computing 5.Java Client/Server 6.Mail Servers 7.Network File System 8.Web Servers

© MJT, IISc 34 Example: SPEC CPU Programs with source code, input data sets, makefiles CINT gzipC Compression 2.vprC FPGA Circuit Placement and Routing 3.gccC C Programming Language Compiler 4.mcfC Combinatorial Optimization 5.crafty C Game Playing: Chess 6.parser C Word Processing 7.eonC++ Computer Visualization 8.perlbmk C PERL Programming Language 9.gap C Group Theory, Interpreter 10.vortex C Object-oriented Database 11.bzip2 C Compression 12.twolf C Place and Route Simulator

© MJT, IISc 35 SPEC CPU2000 … CFP wupwiseFortran 77 Quantum Chromodynamics 2.swim Fortran 77 Shallow Water Modeling 3.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field 4.applu Fortran 77 Parabolic/Elliptic PDEs 5.mesa C 3-D Graphics Library 6.galgel Fortran 90 Computational Fluid Dynamics 7.art C Image Recognition / Neural Networks 8.equake C Seismic Wave Propagation Simulation 9.facerecFortran 90 Face Recognition 10.ammp C Computational Chemistry 11.lucas Fortran 90 Number Theory / Primality Testing 12.fma3d Fortran 90 Finite-element Crash Simulation 13.sixtrack Fortran 77 Hi Energy Phys Accelerator Design 14.apsiFortran 77 Meteorology: Pollutant Distribution

© MJT, IISc 36 More Recently: SPEC CINT006 1.perlbenchCPERL Programming Language 2.bzip2 CCompression 3.gccCC Compiler 4.mcfCCombinatorial Optimization 5.gobmk CArtificial Intelligence: go 6.hmmer CSearch Gene Sequence 7.sjeng CArtificial Intelligence: chess 8.libquantum CPhysics: Quantum Computing 9.h264ref CVideo Compression 10.omnetpp C++Discrete Event Simulation 11.astar C++Path-finding Algorithms 12.xalancbmk C++XML Processing

© MJT, IISc 37 SPEC CFP bwavesFortran Fluid Dynamics 2.gamessFortran Quantum Chemistry 3.milcC Physics: Quantum Chromodynamics 4.zeusmpFortran Physics/CFD 5.gromacsC/Fortran Biochemistry/Molecular Dynamics 6.cactusADM C/Fortran Physics/General Relativity 7.leslie3dFortran Fluid Dynamics 8.namdC++ Biology/Molecular Dynamics 9.dealllC++ Finite Element Analysis 10.soplexC++ Linear Programming, Optimization 11.povrayC++ Image Ray-tracing 12.calculixC/Fortran Structural Mechanics 13.GemsFDTDFortran Computational Electromagnetics 14.tontoFortran Quantum Chemistry 15.lbmC Fluid Dynamics 16.wrfC/Fortran Weather Prediction 17.sphinx3C Speech recognition

© MJT, IISc 38 Problem: SPEC program execution duration In term of instructions executed  CPU2000 Average: ~300 billion  Simulated at a speed of 1MIPS Programs to be simulated are getting larger  SPEC CPU2006: increase in program execution length by an order of magnitude Even more detailed simulation is needed  System level simulation, which takes operating system into account: 1000 times slower than SimpleScalar would take 4 days

© MJT, IISc 39 Approaches to Address this Problem Purpose of simulation: to estimate program CPI 1.Use (small) input data so that there is reduced execution time 2.Don’t simulate entire program execution  Example: Skip initial 1Billion instructions and then estimate CPI by simulating only the next 1Billion instructions 3.Simulate (carefully) selected parts of program execution on the regular input data  Example: SimPoint, SMARTS

© MJT, IISc 40 Reference: Wunderlich, Wenisch, Falsafi and Hoe, `SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling’, 30th ISCA (ACM/IEEE International Symposium on Computer Architecture) 2003 The Problem: A lot of computer architecture research is done through simulation Microarchitecture simulation is extremely time consuming

© MJT, IISc 41 Architecture Conferences 1. ISCA: International Symposium on Computer Architecture 2. ASPLOS: ACM Symposium on Architectural Support for Programming Languages and Operating Systems 3. HPCA: International Symposium on High Performance Computer Architecture 4. MICRO: International Symposium on Microarchitecture

© MJT, IISc 42 SMARTS Framework Complete program execution Not time line, but instruction line From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 43 SMARTS Framework From Wunderlich et al, 30 th ISCA 2003 Must simulate more than 1 instruction to estimate CPI Let U be the number of instructions simulated in a sample U, Sampling Unit size, Number of instructions that are simulated in detail in each sample

© MJT, IISc 44 SMARTS Framework. Must simulate more than 1 instruction to estimate CPI Let U be the number of instructions simulated in a sample U, Sampling Unit size, Number of instructions that are simulated in detail in each sample U N, Benchmark length in terms of Sampling Units From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 45 SMARTS Framework.. Systematic Sampling: Every k th sampling unit is simulated in detail From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 46 SMARTS Framework … Systematic Sampling: Every k th sampling unit is simulated in detail From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 47 SMARTS Framework …. Systematic Sampling: Every k th sampling unit is simulated in detail W, Number of instructions that detailed warming is done before each sample is taken From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 48 SMARTS Framework ….. Systematic Sampling: Every k th sampling unit is simulated in detail W, Number of instructions that detailed warming is done before each sample is taken From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 49 SMARTS Framework …… Systematic Sampling: Every k th sampling unit is simulated in detail W, Number of instructions that detailed warming is done before each sample is taken From Wunderlich et al, 30 th ISCA 2003 n, Total number of samples taken Functional Warming: Functional simulation + maintenance of selected microarchitecture state (such as cache hierarchy state, branch predictor state)

© MJT, IISc 50 From Wunderlich et al, 30 th ISCA 2003 Choice of Sample Size U

© MJT, IISc 51 Levels off after U = 1000 From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 52 Levels off after U = 1000 Previous approaches: Samples of 100M to 1B instructions From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 53 Large Samples Are Not Necessary Levels off after U = 1000 Previous approaches: Samples of 100M to 1B instructions From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 54 How Effective is SMARTS? From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 55 How Effective is SMARTS? Much faster simulation: 30x SimpleScalar From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 56 How Effective is SMARTS? Much faster simulation: 30x SimpleScalar Much lower average error, but 1.8 times slower than SimPoint From Wunderlich et al, 30 th ISCA 2003

© MJT, IISc 57 Analytical Modeling The Problem: A lot of computer architecture research is done through simulation Microarchitecture simulation is extremely time consuming Another solution: Analytical modeling Example: Karkhanis & Smith, A First-order Model of Superscalar Processors, 31st ISCA 2004, and doesn’t provide insight into what is happening in the processor

© MJT, IISc 58 Approach Objective: Analytical model for estimating superscalar processor program CPI  Inputs to the model: program characteristics Basic idea From Karkhanis, Smith 31st ISCA 2004 Steady state IPC

© MJT, IISc 59 Approach. Objective: Analytical model for estimating superscalar processor program CPI  Inputs to the model: program characteristics Basic idea From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 60 Approach.. Objective: Analytical model for estimating superscalar processor program CPI  Inputs to the model: program characteristics Basic idea From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 61 Approach.. Objective: Analytical model for estimating superscalar processor program CPI  Inputs to the model: program characteristics Basic idea Model the IPC loss due to these three major miss events They can be considered to be independent Steady state IPC From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 62 Approach.. Objective: Analytical model for estimating superscalar processor program CPI  Inputs to the model: program characteristics Basic idea Model the IPC loss due to these three major miss events They can be considered to be independent Steady state IPC From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 63 Important Input: IW Characteristic Relationship between number of instructions in instruction window and number of instructions that issue Used to calculate Power Law relationship “Starting with dependence statistics taken from instruction traces, the points on the IW curve … can be characterized by a set of relatively complex simultaneous non-linear equations” From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 64 Important Input: IW Characteristic. Relationship between number of instructions in instruction window and number of instructions that issue i = 1, N (size of instruction window) Probability that instruction j+i is dependent on instruction j From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 65 Branch Misprediction Penalty steady state IPC time (cycles) From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 66 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 67 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 68 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 69 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 70 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 71 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 72 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 73 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 74 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 75 Branch Misprediction Penalty. From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 76 Branch Misprediction Penalty.. Front-end pipeline depth Ramp up Window Drain+ + Isolated Branch Misprediction Penalty = From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 77 ICache Miss Penalty From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 78 ICache Miss Penalty Miss delay - Window drain + Ramp up Isolated ICache Miss Penalty = From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 79 Long DCache Miss Penalty Isolated DCache Miss Penalty = Miss delay – ROB fill – Window drain + Ramp up From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 80 Accuracy of Model Average error 5.8% Maximum error 13% Higher than SMARTS but much faster From Karkhanis, Smith 31st ISCA 2004

© MJT, IISc 81 Lecture Summary Architecture evaluation studies make heavy use of simulation Simulation speedup through techniques like sampling is widely used Analytical modeling has been attempted too; it is much faster but less accurate Simulation speedup and model building are still areas of research activity