Advanced Computer Architecture Lab University of Michigan MASE Eric Larson MASE: Micro Architectural Simulation Environment Eric Larson, Saugata Chatterjee,

Slides:

Advertisements

Similar presentations

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Advertisements

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.

Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.

SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.

Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

EECS 470 Lecture 7 Branches: Address prediction and recovery (And interrupt recovery too.)

Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.

SoC CAD 1 Tuning the Continual Flow Pipeline Architecture 徐子傑 Hsu,Zi Jei Department of Electrical Engineering National Cheng Kung University Tainan,

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.

A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.

Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.

Multiscalar processors

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture Facilitate parallel execution Scale well with advancing.

EECS 470 Memory Scheduling Lecture 11 Coverage: Chapter 3.

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical.

1 Practical Selective Replay for Reduced-Tag Schedulers Dan Ernst and Todd Austin Advanced Computer Architecture Lab The University of Michigan June 8.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

1 Introduction to SimpleScalar (Based on SimpleScalar Tutorial) CPSC 614 Texas A&M University.

Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.

Instruction Issue Logic for High- Performance Interruptible Pipelined Processors Gurinder S. Sohi Professor UW-Madison Computer Architecture Group University.

1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.

Out-of-Order Commit Processors Adrián Cristal (UPC), Daniel Ortega (HP Labs), Josep Llosa (UPC) and Mateo Valero (UPC) HPCA-10, Madrid February th.

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.

1/25 June 28 th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With.

1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.

Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Cyclone: A Low-Complexity Broadcast-Free Dynamic Instruction.

The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.

1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.

CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.

Advanced Pipelining 7.1 – 7.5. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike.

1/25 HIPEAC 2008 TurboROB TurboROB A Low Cost Checkpoint/Restore Accelerator Patrick Akl 1 and Andreas Moshovos AENAO Research Group Department of Electrical.

1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.

CS203 – Advanced Computer Architecture ILP and Speculation.

Dynamic Scheduling Why go out of style?

Multiscalar Processors

/ Computer Architecture and Design

CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions

Introduction to SimpleScalar

Introduction to SimpleScalar (Based on SimpleScalar Tutorial)

Tomasulo Loop Example Loop: LD F0 0 R1 MULTD F4 F0 F2 SD F4 0 R1

Lecture: Out-of-order Processors

Lecture 12 Reorder Buffers

CMSC 611: Advanced Computer Architecture

Lecture 6: Advanced Pipelines

Lecture 10: Out-of-order Processors

Lecture 11: Out-of-order Processors

Lecture: Out-of-order Processors

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture 11: Memory Data Flow Techniques

Adapted from the slides of Prof

15-740/ Computer Architecture Lecture 5: Precise Exceptions

* From AMD 1996 Publication #18522 Revision E

Adapted from the slides of Prof

Overview Prof. Eric Rotenberg

Patrick Akl and Andreas Moshovos AENAO Research Group

Lois Orosa, Rodolfo Azevedo and Onur Mutlu

Presentation transcript:

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson MASE: Micro Architectural Simulation Environment Eric Larson, Saugata Chatterjee, and Todd Austin University of Michigan November 5, 2001

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson MASE overview MASE is a new performance simulation infrastructure for SimpleScalar. Features and goals of MASE: –Checker improves validation support. –Oracle allows for “perfect” studies. –Micro-functional performance model increases accuracy. –Speculative state management facilities simplify aggressive speculation. –Callback interface permits sophisticated memory system simulation. A test release of MASE will be available in December at

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson SimpleScalar 3.0 software architecture Functional Units IFIDCT Reorder Buffer (ROB)

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson MASE software architecture Instruction State Queue (ISQ) Functional Units Memory simulator IFID Oracle CT Checker Reorder Buffer (ROB) callback interface

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson Checker and oracle Permit “perfect” studies and improved validation. Oracle executes in fetch and places values into ISQ. Checker uses ISQ values to validate core computation. Checker will fix any core bug, reducing burden of correctness in core. Instruction State Queue (ISQ) F. Units Memory Sim IFID Oracle CT Checker Reorder Buffer (ROB) callback interface

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson Micro-functional performance model Trace-driven techniques cannot accurately model timing- dependent computation. –For example, mispeculation and shared memory race conditions. Instructions are now executed in the core with proper timing. Further improves validation, intertwining timing and correctness. Instruction State Queue (ISQ) F. Units Memory Sim IFID Oracle CT Checker Reorder Buffer (ROB) callback interface

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson Support for aggressive speculation SimpleScalar lacks arbitrary instruction restart. Only branches can restart. MASE allows any instruction to mispeculate and restart core. Several data structures (such as the ROB and ISQ) were modified to support arbitrary rollback. Instruction State Queue (ISQ) F. Units Memory Sim IFID Oracle CT Checker Reorder Buffer (ROB) callback interface

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson Memory system with callback interface SimpleScalar’s memory system requires that instruction latency be known at issue. –Not representative of modern memory systems. –For example, DRAM accesses can be reordered to increase page hit rates. Instructions use callback interface to asynchronously declare their (remaining) latency. Instruction State Queue (ISQ) F. Units Memory Sim IFID Oracle CT Checker Reorder Buffer (ROB) callback interface

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson Memory system with callback interface Performance Simulator Memory System 1. Issue load 2. Call cache_access with: callback = cb_fn, rid = 5 3. Return mem_unknown 4. Determine latency 5. Call cb_fn with: rid = 5, lat = Schedule completion for load

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson Other improvements Algorithm for detecting when store data can be forwarded to loads has been improved (more aggressive). Register update unit (RUU) has been split into a reorder buffer (ROB) and reservation stations (RS). Added a scheduler queue. –Scheduler predicts the latency of each instruction. –Instructions are replayed if the prediction is too small. Added a front-end queue. –Improves misprediction delay accuracy. –Can simulate additional stages in the front-end pipeline.

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson Early results and analyses Validated MASE against SimpleScalar 3.0 sim-outorder. –Less than 1% difference for SPEC95 integer benchmarks. MASE is half as fast as sim-outorder, but MASE is unoptimized (future work). Arbitrary speculation mechanism tested with blind load speculation study. –Implementation was straight-forward in MASE. Checker simplified implementation of store forwarding. –Partial store forwarding logic was not implemented. –Relied on checker to detect and correct these cases. –Minor inaccuracy, at most 195 errors ( vortex). Checker proved to be a valuable debugging aid when implementing other features of MASE.

Advanced Computer Architecture Lab University of Michigan MASE Eric Larson Conclusion Checker supports validation by reducing the burden of correctness on the core. Micro-functional core allows for more accurate modeling. Speculative state management facilities simplify implementations of aggressive speculation techniques. Memory system callback interface supports modern memory systems. A test release of MASE will be available in December at