We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byLily Cubit
Modified about 1 year ago
Lecture 8: More ILP stuff Professor Alvin R. Lebeck Computer Science 220 Fall 2001
2 © Alvin R. Lebeck 2001 Admin Thursday office hours to Wednesday 10am Homework #2 Due Today –Submit simulator code, fareed will post instructions Homework #3 assigned Please me your projects, need my approval… Project Proposal (October 2) –Short document –Short presentation Papers to read on web page
3 © Alvin R. Lebeck 2001 CPS 220 Summary Branch Prediction –Branch History Table: 2 bits for loop accuracy –Correlation: Recently executed branches correlated with next branch –Branch Target Buffer: include branch address & prediction Superscalar and VLIW –CPI < 1 –Dynamic issue vs. Static issue –More instructions issue at same time, larger the penalty of hazards SW Pipelining –Symbolic Loop Unrolling to get most from pipeline with little code expansion, little overhead What about non-loop codes? –How do you get > 1 instruction
4 © Alvin R. Lebeck 2001 CPS 220 Trace Scheduling Parallelism across IF branches vs. LOOP branches Two steps: –Trace Selection »Find likely sequence of basic blocks (trace) of (statically predicted) long sequence of straight-line code –Trace Compaction »Squeeze trace into few VLIW instructions »Need bookkeeping code in case prediction is wrong
5 © Alvin R. Lebeck 2001 CPS 220 Trace Scheduling Reorder these instructions to improve ILP Fix-up instructions In case we were wrong
6 © Alvin R. Lebeck 2001 CPS 220 HW support for More ILP Avoid branch prediction by turning branches into conditionally executed instructions: if (x) then A = B op C else NOP –If false, then neither store result nor cause exception –Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move; PA-RISC can annul any following instr, IA-64 predicated execution. Drawbacks to conditional instructions –Still takes a clock even if “annulled” –Stall if condition evaluated late –Complex conditions reduce effectiveness; condition becomes known late in pipeline
7 © Alvin R. Lebeck 2001 CPS 220 HW support for More ILP Speculation: allow an instruction to issue that is dependent on branch predicted to be taken without any consequences (including exceptions) if branch is not actually taken (“HW undo” squash) Often try to combine with dynamic scheduling Tomasulo: separate speculative bypassing of results from real bypassing of results –When instruction no longer speculative, write results (instruction commit) –execute out-of-order but commit in order
8 © Alvin R. Lebeck 2001 CPS 220 Speculation ADD LD SUB R2 BEQ L1 ADD R1, R3 LD R2 SUB ST ADD R2, R1 LD SUB ST Predicted Path Correct Path B1 B2B3 4-way issue All of B2 issued speculatively Must be squashed Could have execute B1 in speculative mode
9 © Alvin R. Lebeck 2001 CPS 220 HW support for More ILP Need HW buffer for results of uncommitted instructions: reorder buffer –Reorder buffer can be operand source –Once operand commits, result is found in register –3 fields: instr. type, destination, value –Use reorder buffer number instead of reservation station –Instructions commit in order –As a result, its easy to undo speculated instructions on mispredicted branches or on exceptions Reorder Buffer FP Regs FP Op Queue FP Adder Res Stations Figure 4.34, page 311
10 © Alvin R. Lebeck 2001 CPS 220 Four Steps of Speculative Tomasulo Algorithm 1.Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination. 2.Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute 3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. 4.Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer.
11 © Alvin R. Lebeck 2001 CPS 220 Limits to ILP Conflicting studies of amount of parallelism available in late 1980s and early 1990s. Different assumptions about: –Benchmarks (vectorized Fortran FP vs. integer C programs) –Hardware sophistication –Compiler sophistication
12 © Alvin R. Lebeck 2001 CPS 220 Limits to ILP Initial HW Model here; MIPS compilers 1. Register renaming–infinite virtual registers and all WAW & WAR hazards are avoided 2. Branch prediction–perfect; no mispredictions 3. Jump prediction–all jumps perfectly predicted => machine with perfect speculation & an unbounded buffer of instructions available 4. Memory-address disambiguation–addresses are known & a store can be moved before a load provided addresses not equal 1 cycle latency for all instructions
13 © Alvin R. Lebeck 2001 CPS 220 Upper Limit to ILP (Figure 4.38, page 319)
14 © Alvin R. Lebeck 2001 CPS 220 More Realistic HW: Branch Impact Figure 4.40, Page 323 Change from Infinite window to examine to 2000 and maximum issue of 64 instructions per clock cycle Profile BHT (512)Pick Cor. or BHTPerfect
15 © Alvin R. Lebeck 2001 CPS 220 Selective Branch Predictor 8096 x 2 bits 2048 x 4 x 2 bits Branch Addr Global History Taken/Not Taken 8K x 2 bit Selector Choose Non-correlator Choose Correlator Taken Not Taken 00
16 © Alvin R. Lebeck 2001 CPS 220 More Realistic HW: Register Impact Figure 4.44, Page 328 Change 2000 instr window, 64 instr issue, 8K 2 level Prediction 64None256Infinite32128
17 © Alvin R. Lebeck 2001 CPS 220 More Realistic HW: Alias Impact Figure 4.46, Page 330 Change 2000 instr window, 64 instr issue, 8K 2 level Prediction, 256 renaming registers NoneGlobal/Stack perf; heap conflicts PerfectInspec. Assem.
18 © Alvin R. Lebeck 2001 CPS 220 Realistic HW for ‘9X: Window Impact (Figure 4.48, Page 332) Perfect disambiguation (HW), 1K Selective Prediction, 16 entry return, 64 registers, issue as many as window Infinite
19 © Alvin R. Lebeck 2001 CPS scalar IBM 71.5 MHz (5 stage pipe) vs. 2-scalar Alpha 200 MHz (7 stages) Braniac vs. Speed Demon (Spec Ratio)
20 © Alvin R. Lebeck 2001 Discussion Technical discussion Style of presentation Relate topics to what we already know Next Time: Energy/Power Next Tuesday: Project proposal presentations
Speculative ExecutionCS510 Computer ArchitecturesLecture Lecture 11 Trace Scheduling, Conditional Execution, Speculation, Limits of ILP.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 9, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
Lecture 7: Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
CS203 – Advanced Computer Architecture ILP and Speculation.
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.
1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
Review Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2006.
EE524/CptS561 Advanced Computer Architecture Dynamic Scheduling A scheme to overcome data hazards.
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
Computer Architecture Lec 8 – Instruction Level Parallelism.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
COMP25212 Advanced Pipelining Out of Order Processors.
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
1 EE524 / CptS561 Computer Architecture Dynamic Branch Prediction.
ENGS 116 Lecture 101 Tomasulo’s Approach and Hardware Based Speculation Vincent H. Berk October 22nd Reading for Today: 3.1 – 3.7.
CS136, Advanced Architecture Limits to ILP Simultaneous Multithreading.
EECC551 - Shaaban #1 lec # 8 Winter Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to.
04/03/2016 slide 1 Dynamic instruction scheduling Key idea: allow subsequent independent instructions to proceed DIVDF0,F2,F4; takes long time ADDDF10,F0,F8;
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
CIS 629 Fall 2002 Multiple Issue/Speculation Multiple Instruction Issue: CPI < 1 To improve a pipeline’s CPI to be better [less] than one, and to utilize.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
1 Chapter 2: ILP and Its Exploitation Review simple static pipeline ILP Overview Dynamic branch prediction Dynamic scheduling, out-of-order execution Hardware-based.
COMP4611 Tutorial 6 Instruction Level Parallelism October 22 nd /23 rd
2/24; 3/1,3/11 (quiz was 2/22, QuizAns 3/8) CSE502-S11, Lec ILP 1 Tomasulo Organization FP adders Add1 Add2 Add3 FP multipliers Mult1 Mult2 From.
1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Mar 17, 2009 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
CS 5513 Computer Architecture Lecture 6 – Instruction Level Parallelism continued.
1 Sixth Lecture: Chapter 3: CISC Processors (Tomasulo Scheduling and IBM System 360/91) Please recall: Multicycle instructions lead to the requirement.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
© 2017 SlidePlayer.com Inc. All rights reserved.