SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.

Slides:

Advertisements

Similar presentations

1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.

Advertisements

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.

AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

February 28, 2012CS152, Spring 2012 CS 152 Computer Architecture and Engineering Lecture 11 - Out-of-Order Issue, Register Renaming, & Branch Prediction.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

CS 152 Computer Architecture and Engineering Lecture 15 - Advanced Superscalars Krste Asanovic Electrical Engineering and Computer Sciences University.

March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.

EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.

Reducing the Complexity of the Register File in Dynamic Superscalar Processors Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings.

The PowerPC Architecture  IBM, Motorola, and Apple Alliance  Based on the IBM POWER Architecture Facilitate parallel execution Scale well with advancing.

1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )

EECS 470 Dynamic Scheduling – Part II Lecture 10 Coverage: Chapter 3.

Ryota Shioya, Masahiro Goshimay and Hideki Ando Micro 47 Presented by Kihyuk Sung.

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Lec17.1 °For in-order pipeline, 2 options: Freeze pipeline in Mem stage (popular early on: Sparc, R4000) IF ID EX Mem stall stall stall … stall Mem Wr.

OOO execution © Avi Mendelson, 4/ MAMAS – Computer Architecture Lecture 7 – Out Of Order (OOO) Avi Mendelson Some of the slides were taken.

COMP381 by M. Hamdi 1 Commercial Superscalar and VLIW Processors.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models.

CS 152 Computer Architecture and Engineering Lecture 23: Putting it all together: Intel Nehalem Krste Asanovic Electrical Engineering and Computer Sciences.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Modeling and Parallel Simulation.

Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,

Basic Microcomputer Design. Inside the CPU Registers – storage locations Control Unit (CU) – coordinates the sequencing of steps involved in executing.

CPE 631: Multithreading: Thread-Level Parallelism Within a Processor Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar.

Understanding The Nehalem Core Note: The examples herein are mostly illustrative. They have shortcommings compared to the real implementation in favour.

Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.

Complexity-Effective Superscalar Processors S. Palacharla, N. P. Jouppi, and J. E. Smith Presented by: Jason Zebchuk.

Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD QSim 1 : Overview  Thread safe multicore.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Manifold Execution Model and System.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Building and Running Parallel Simulations.

RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.

Anshul Kumar, CSE IITD CSL718 : Superscalar Processors Speculative Execution 2nd Feb, 2006.

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Computer Structure 2015 – Intel ® Core TM μArch 1 Computer Structure Multi-Threading Lihu Rappoport and Adi Yoaz.

CSL718 : Superscalar Processors

CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue

Computer Structure Multi-Threading

Timing Model of a Superscalar O-o-O processor in HAsim Framework

Out of Order Processors

CSE 502: Computer Architecture

Lecture: Out-of-order Processors

Microprocessor Microarchitecture Dynamic Pipeline

Lecture 12 Reorder Buffers

Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 10: Out-of-order Processors

Lecture 11: Out-of-order Processors

Lecture: Out-of-order Processors

Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Lecture 11: Memory Data Flow Techniques

Lecture 17: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.

CPE 631: Multithreading: Thread-Level Parallelism Within a Processor

Lecture 8: Dynamic ILP Topics: out-of-order processors

Krste Asanovic Electrical Engineering and Computer Sciences

Mattan Erez The University of Texas at Austin

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

Lecture 19: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.

Overview Prof. Eric Rotenberg

Mattan Erez The University of Texas at Austin

Lecture 10: ILP Innovations

Lecture 9: ILP Innovations

Lecture 9: Dynamic ILP Topics: out-of-order processors

Sizing Structures Fixed relations Empirical (simulation-based)

Presentation transcript:

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Back-end Timing Models Core Models Interconnection Network Memory System Coherence Cache Hierarchy DRAM Controller 1

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Core Models Zesto – cycle-level x86 processor model SPX – a light pipe-lined model SimpleProc – a 1-IPC model 2

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Zesto* Overview Consists of an oracle and a detailed pipeline Oracle is an “execution-at-fetch” functional simulator; fetched instructions are first passed to oracle Pipeline has 5 stages Very detailed; slow (10’s of KIPS) Ported to Manifold as a component *G. Loh, etc., “Zesto: a cycle-level simulator for highly detailed microarchitecture exploration,” International Symposium on Performance Analysis of Software and Systems (ISPASS), pp53-64, 2009 Oracle (functional) Detailed pipeline model (superscalar / OOO pipeline) cache x86 instructions6 recovery info memory requests and replies 3

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Zesto X86 based timing model derived from Zesto cycle-level simulator IA 32 support 4

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Zesto Interface Front-end interfaces Cache request Event handler (for cache response) Clocked function Multiple instances with different front-end interfaces 5

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Zesto Interface ZestoCacheReq event handler for cache response class core_t { public:... void cache_response_handler(int, ZestoCacheReq*);... }; 6

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Zesto Interface Clocked function must be registered with a clock Zesto does not register this function, so it must be registered in the simulator program. void core_t :: tick() { //get the pipeline going commit->step(); exec->LDST_exec();... alloc->step(); decode->step(); fetch->step();... } 7

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Simplified Pipeline eXecution (SPX) 8

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD SPX 9 SPX is an abbreviated core model that support both out-of-order and in-order executions. Purpose: Detail vs. Speed For less core-sensitive simulations (i.e., memory/network traffic analysis), detailed core implementation may be unnecessary. Physics simulations need longer observation time (in real seconds) than typical architectural simulations e.g., temperature doesn’t change much for several hundreds million clock cycles (or several hundred milliseconds) of simulation. The SPX model provides enough details to model out-of-order and in-order executions, and other optional behaviors are all abbreviated (e.g., predictions) The SPX requires Qsim Library for detailed instruction information. Queue model is not currently supported. SPX uses direct Qsim callback functions.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Modeled Components in SPX 10 InstQ: A queue that fetches the instructions from Qsim. RF: Register file that tracks only dependency; behaves more like RAT. Dependency for general register files and flags are maintained. RS: Dependency status tracker. It does not have an actual table like scoreboard. When an instruction is ready (cleared from dependency), it is put in the ReadyQ. In-order execution is modeled with 1-entry RS. EX (FU): Multi-ported execution units. Each FU is defined with latency and issue rate. ROB: Re-order buffer for out-of-order execution An instruction can be broken into multiple sub-instructions (e.g., u-ops), and ROB commits the instruction when all sub-instructions are completed. LDQ/STQ: Load and store queues These queues check memory dependency between ST/LD instructions.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Out-of-order Pipeline 11 InstQ Qsim Lib Callbacks Translation to SPX instruction ROB RSLDQSTQ Allocate if Available RF Allocate if Available Allocate if LD And Available Allocate if ST And Available Resolve Dependency Reg. Dep. Status EX FU FU Port Binding Mem. Dep. CheckWhen Allocate Schedule Schedule When Commit Cache Req. After EX Cache Req. After Commit Update Store Fwd Update Cache Resp. Update Writeback

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD In-order Pipeline 12 InstQ Qsim Lib Callbacks Translation to SPX instruction threadQLDQSTQRF Allocate if Available Allocate if LD And Available Allocate if ST And Available Resolve Dependency Reg. Dep. Status EX FU FU Port Binding Mem. Dep. CheckWhen Allocate Schedule Cache Req. After EX Writeback Store Fwd Update Cache Resp. Update Schedule

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD SPX Interface Front-end Cache request Event handler (for cache response) Clocked function 13

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD SPX Interface cache_reqest_t event handler for cache response class spx_core_t { public:... void handle_cache_response(int, cache_reqest_t*);... }; clocked function must be registered with a clock SPX does not register this function, so it must be registered in the simulator program. void spx_core_t :: tick() {... } 14

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD SimpleProc 1-IPC: in general one instruction is executed every cycle MSHR limits number of outstanding cache requests 15

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD SimpleProc Interface Front-end Cache request Event handler (for cache response) Clocked function 16

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD SimpleProc Interface CacheReq event handler for cache response class SimpleProcessor { public:... void handle_cache_response(int, CacheReq*);... }; clocked function must be registered with a clock SimpleProc does not register this function, so it must be registered in the simulator program. void SimpleProcessor :: tick() {... } 17

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | SCHOOL OF COMPUTER SCIENCE | GEORGIA INSTITUTE OF TECHNOLOGY MANIFOLD Outline Introduction Execution Model and System Architecture Multicore Emulator Front-End Component Models Cores Network Memory System Building and Running Manifold Simulations Physical Modeling: Energy Introspector Some Example Simulators 18