We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byLily Cubit
Modified over 2 years ago
Lecture 8: More ILP stuff Professor Alvin R. Lebeck Computer Science 220 Fall 2001
2 © Alvin R. Lebeck 2001 Admin Thursday office hours to Wednesday 10am Homework #2 Due Today –Submit simulator code, fareed will post instructions Homework #3 assigned Please email me your projects, need my approval… Project Proposal (October 2) –Short document –Short presentation Papers to read on web page
3 © Alvin R. Lebeck 2001 CPS 220 Summary Branch Prediction –Branch History Table: 2 bits for loop accuracy –Correlation: Recently executed branches correlated with next branch –Branch Target Buffer: include branch address & prediction Superscalar and VLIW –CPI < 1 –Dynamic issue vs. Static issue –More instructions issue at same time, larger the penalty of hazards SW Pipelining –Symbolic Loop Unrolling to get most from pipeline with little code expansion, little overhead What about non-loop codes? –How do you get > 1 instruction
4 © Alvin R. Lebeck 2001 CPS 220 Trace Scheduling Parallelism across IF branches vs. LOOP branches Two steps: –Trace Selection »Find likely sequence of basic blocks (trace) of (statically predicted) long sequence of straight-line code –Trace Compaction »Squeeze trace into few VLIW instructions »Need bookkeeping code in case prediction is wrong
5 © Alvin R. Lebeck 2001 CPS 220 Trace Scheduling Reorder these instructions to improve ILP Fix-up instructions In case we were wrong
6 © Alvin R. Lebeck 2001 CPS 220 HW support for More ILP Avoid branch prediction by turning branches into conditionally executed instructions: if (x) then A = B op C else NOP –If false, then neither store result nor cause exception –Expanded ISA of Alpha, MIPS, PowerPC, SPARC have conditional move; PA-RISC can annul any following instr, IA-64 predicated execution. Drawbacks to conditional instructions –Still takes a clock even if “annulled” –Stall if condition evaluated late –Complex conditions reduce effectiveness; condition becomes known late in pipeline
7 © Alvin R. Lebeck 2001 CPS 220 HW support for More ILP Speculation: allow an instruction to issue that is dependent on branch predicted to be taken without any consequences (including exceptions) if branch is not actually taken (“HW undo” squash) Often try to combine with dynamic scheduling Tomasulo: separate speculative bypassing of results from real bypassing of results –When instruction no longer speculative, write results (instruction commit) –execute out-of-order but commit in order
8 © Alvin R. Lebeck 2001 CPS 220 Speculation ADD LD SUB R2 BEQ L1 ADD R1, R3 LD R2 SUB ST ADD R2, R1 LD SUB ST Predicted Path Correct Path B1 B2B3 4-way issue All of B2 issued speculatively Must be squashed Could have execute B1 in speculative mode
9 © Alvin R. Lebeck 2001 CPS 220 HW support for More ILP Need HW buffer for results of uncommitted instructions: reorder buffer –Reorder buffer can be operand source –Once operand commits, result is found in register –3 fields: instr. type, destination, value –Use reorder buffer number instead of reservation station –Instructions commit in order –As a result, its easy to undo speculated instructions on mispredicted branches or on exceptions Reorder Buffer FP Regs FP Op Queue FP Adder Res Stations Figure 4.34, page 311
10 © Alvin R. Lebeck 2001 CPS 220 Four Steps of Speculative Tomasulo Algorithm 1.Issue—get instruction from FP Op Queue If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no. for destination. 2.Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch CDB for result; when both in reservation station, execute 3.Write result—finish execution (WB) Write on Common Data Bus to all awaiting FUs & reorder buffer; mark reservation station available. 4.Commit—update register with reorder result When instr. at head of reorder buffer & result present, update register with result (or store to memory) and remove instr from reorder buffer.
11 © Alvin R. Lebeck 2001 CPS 220 Limits to ILP Conflicting studies of amount of parallelism available in late 1980s and early 1990s. Different assumptions about: –Benchmarks (vectorized Fortran FP vs. integer C programs) –Hardware sophistication –Compiler sophistication
12 © Alvin R. Lebeck 2001 CPS 220 Limits to ILP Initial HW Model here; MIPS compilers 1. Register renaming–infinite virtual registers and all WAW & WAR hazards are avoided 2. Branch prediction–perfect; no mispredictions 3. Jump prediction–all jumps perfectly predicted => machine with perfect speculation & an unbounded buffer of instructions available 4. Memory-address disambiguation–addresses are known & a store can be moved before a load provided addresses not equal 1 cycle latency for all instructions
13 © Alvin R. Lebeck 2001 CPS 220 Upper Limit to ILP (Figure 4.38, page 319)
14 © Alvin R. Lebeck 2001 CPS 220 More Realistic HW: Branch Impact Figure 4.40, Page 323 Change from Infinite window to examine to 2000 and maximum issue of 64 instructions per clock cycle Profile BHT (512)Pick Cor. or BHTPerfect
15 © Alvin R. Lebeck 2001 CPS 220 Selective Branch Predictor 8096 x 2 bits 2048 x 4 x 2 bits Branch Addr Global History 2 00 01 10 11 Taken/Not Taken 8K x 2 bit Selector 11 10 01 00 Choose Non-correlator Choose Correlator 1010 11 Taken 10 01 Not Taken 00
16 © Alvin R. Lebeck 2001 CPS 220 More Realistic HW: Register Impact Figure 4.44, Page 328 Change 2000 instr window, 64 instr issue, 8K 2 level Prediction 64None256Infinite32128
17 © Alvin R. Lebeck 2001 CPS 220 More Realistic HW: Alias Impact Figure 4.46, Page 330 Change 2000 instr window, 64 instr issue, 8K 2 level Prediction, 256 renaming registers NoneGlobal/Stack perf; heap conflicts PerfectInspec. Assem.
18 © Alvin R. Lebeck 2001 CPS 220 Realistic HW for ‘9X: Window Impact (Figure 4.48, Page 332) Perfect disambiguation (HW), 1K Selective Prediction, 16 entry return, 64 registers, issue as many as window 6416256Infinite3212884
19 © Alvin R. Lebeck 2001 CPS 220 8-scalar IBM Power-2 @ 71.5 MHz (5 stage pipe) vs. 2-scalar Alpha 21064 @ 200 MHz (7 stages) Braniac vs. Speed Demon (Spec Ratio)
20 © Alvin R. Lebeck 2001 Discussion Technical discussion Style of presentation Relate topics to what we already know Next Time: Energy/Power Next Tuesday: Project proposal presentations
Speculative ExecutionCS510 Computer ArchitecturesLecture Lecture 11 Trace Scheduling, Conditional Execution, Speculation, Limits of ILP.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 9, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
Lecture 7: Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Oct 19, 2005 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
CS203 – Advanced Computer Architecture ILP and Speculation.
CPSC614 Lec 5.1 Instruction Level Parallelism and Dynamic Execution #4: Based on lectures by Prof. David A. Patterson E. J. Kim.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
ENGS 116 Lecture 91 Dynamic Branch Prediction and Speculation Vincent H. Berk October 10, 2005 Reading for today: Chapter 3.2 – 3.6 Reading for Wednesday:
1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
DAP Spr.‘98 ©UCB 1 Lecture 6: ILP Techniques Contd. Laxmi N. Bhuyan CS 162 Spring 2003.
1 Zvika Guz Slides modified from Prof. Dave Patterson, Prof. John Kubiatowicz, and Prof. Nancy Warter-Perez Out Of Order Execution.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Lecture 6: ILP HW Case Study— CDC 6600 Scoreboard & Tomasulo’s Algorithm Professor Alvin R. Lebeck Computer Science 220 Fall 2001.
© 2017 SlidePlayer.com Inc. All rights reserved.