We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byLandon Green
Modified about 1 year ago
11 Dynamic scheduling Kosarev Nikolay MIPT Apr, 2010
22 Agenda In-order execution Out-of-order execution. Tomasulo’s algorithm Implementation in hardware Demo Hardware speculation Demo
33 In-order execution Data hazards - RAW, WAW. No WAR. Pipeline DIVR1 = R2, R3 ADDR9 = R1, R4 SUBR8 = R4, R5 DIVR1 = R2, R3 ADDR1 = R2, R4 SUBR6 = R1, R5 (but code has no sense)
44 Out-of-order execution Split ID into 2 stages: Issue - IS Decode, check for structural hazards Read operands - RO Wait until no data hazards, read operands Pipeline Out-of-order execution implies out-of-order completion (WB) Hazards – RAW, WAW, WAR DIVR0 = R2, R4 ADDR6 = R0, R8 SUBR8 = R10, R14 MULR6 = R10, R8
55 Tomasulo’s algorithm How are data hazards avoided? RAW – wait for availability of operands WAR, WAW – register renaming (переименование регистров) DIVR0 = R2, R4 ADDR6 = R0, R8 ADDR9 = R6, R1 SUBR8 = R10, R14 MULR6 = R10, R8 DIVR0 = R2, R4 ADDA = R0, R8 ADDR9 = A, R1 SUBB = R10, R14 MULR6 = R10, B
66 Implementation in HW
77 Demo Tomasulo's algorithm for dynamic scheduling LDF6 = R2, 2 LDF2 = R3, 4 MULF0 = F2, F4 SUBF8 = F2, F6 DIVF10 = F0, F6 ADDF6 = F8, F2
88 Hardware speculation Based on 3 key ideas: Dynamic branch prediction Speculative execution Dynamic scheduling Extra stage: instruction commit New buffer: ROB (reorder buffer) Pipeline
99 Hardware speculation
10 Demo Reorder buffer
Instruction Level Parallelism Taewook Oh. Instruction Level Parallelism Measure of how many of the operations in a computer program can be performed simultaneously.
Out-of-Order Execution & Register Renaming Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Asanovic/Devadas Spring.
Scoreboarding & Tomasulos Approach Bazat pe slide-urile lui Vincent H. Berk.
Spring 2003CSE P5481 Out-of-Order Execution Several implementations out-of-order completion CDC 6600 with scoreboarding IBM 360/91 with Tomasulos algorithm.
Asanovic/Devadas Spring Advanced Superscalar Architectures Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology.
CMSC 611: Advanced Computer Architecture Tomasulo Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
1 Review of Chapters 3 & 4 Copyright © 2012, Elsevier Inc. All rights reserved.
William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors.
Tomasulo without Re-order Buffer Opcode Operand1 Operand2 Reservation station MUL1 RS MUL2RS Store1 Multiply unit 1 Mul unit 2 Store unit 1 RS Store2 Store.
CH14 Instruction Level Parallelism and Superscalar Processors CH01 TECH Computer Science Decode and issue more and one instruction at a time Executing.
UTCS CS352, S07 Lecture 10 1 Pipelining Cycle F Instruction RXMW FRXMW FRXMW FRXMW FRXM FRX
SE-292 High Performance Computing Pipelining R. Govindarajan
MS108 Computer System I Lecture 7 Tomasulos Algorithm Prof. Xiaoyao Liang 2014/3/24 1.
In-Order Execution In-order execution does not always give the best performance on superscalar machines. The following example uses in-order execution.
Chapter 13 Instruction-Level Parallelism and Superscalar Processors.
Final touches on Out-of-Order execution Review Superscalar Looking back Looking forward.
Multithreaded Processors. Pipeline Hazards LW r1, 0(r2) LW r5, 12(r1) ADDI r5, r5, #12 SW 12(r1), r5 Each instruction may depend on the next – Without.
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors Onur Mutlu, The University of Texas at Austin Jared Start,
Lecture 8: Data-Capture Instruction Schedulers. The goal is to execute instructions in dataflow order as opposed to the sequential order specified by.
CSE502: Computer Architecture Out-of-Order Schedulers.
Chapter 4 The Processor. Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle.
11 Measuring performance Kosarev Nikolay MIPT Feb, 2010.
CMPUT Computer Organization and Architecture II1 CMPUT680 - Winter 2006 Topic I: Superblock and Hyperblock Formation José Nelson Amaral
ARM CPU Internal I Prof. Taeweon Suh Computer Science Education Korea University.
Pipeline Hazards Krste Asanovic Laboratory for Computer Science M.I.T.
ATI Stream Computing ATI Radeon™ HD 3800/4800 Series GPU Hardware Overview Micah Villmow May 30, 2008.
Genes and Evolution Comparative Genome Structure and Evolution Synteny- comparison of chromosome order in related species.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 7 Microarchitecture.
NC STATE UNIVERSITY Transparent Control Independence (TCI) Ahmed S. Al-Zawawi Vimal K. Reddy Eric Rotenberg Haitham H. Akkary* *Dept. of Electrical & Computer.
Yaser Zhian Fanafzar Game Studio IGDI, Workshop 07, January 2 nd, 2013.
© 2016 SlidePlayer.com Inc. All rights reserved.