Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler.

Slides:



Advertisements
Similar presentations
UPC Compiler Support for Trace-Level Speculative Multithreaded Architectures Antonio González λ,ф Carlos Molina ψ Jordi Tubella ф INTERACT-9, San Francisco.
Advertisements

© 2006 Edward F. Gehringer ECE 463/521 Lecture Notes, Spring 2006 Lecture 1 An Overview of High-Performance Computer Architecture ECE 463/521 Spring 2006.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Hardware-based Devirtualization (VPC Prediction) Hyesoon Kim, Jose A. Joao, Onur Mutlu ++, Chang Joo Lee, Yale N. Patt, Robert Cohn* ++ *
Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution The University of Texas at Austin *Oregon Microarchitecture.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Advanced Computer Architecture Lab University of Michigan MASE Eric Larson MASE: Micro Architectural Simulation Environment Eric Larson, Saugata Chatterjee,
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
A Scalable Front-End Architecture for Fast Instruction Delivery Paper by: Glenn Reinman, Todd Austin and Brad Calder Presenter: Alexander Choong.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.
UPC Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures Carlos Molina ψ, ф Jordi Tubella ф Antonio González λ,ф ISHPC-VI,
1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.
2/15/2006"Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Address-Value Delta (AVD) Prediction Onur Mutlu Hyesoon Kim Yale N. Patt.
Multiscalar processors
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Prophet/Critic Hybrid Branch Prediction Falcon, Stark, Ramirez, Lai, Valero Presenter: Christian Wanamaker.
UPC Trace-Level Speculative Multithreaded Architecture Carlos Molina Universitat Rovira i Virgili – Tarragona, Spain Antonio González.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
CB E D F G Frequently executed path Not frequently executed path Hard to predict path A C E B H Insert select-µops (φ-nodes SSA) Diverge Branch CFM point.
Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps José A. Joao *‡ Onur Mutlu ‡* Hyesoon Kim § Rishi Agarwal.
Revisiting Load Value Speculation:
1 Practical Selective Replay for Reduced-Tag Schedulers Dan Ernst and Todd Austin Advanced Computer Architecture Lab The University of Michigan June 8.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
1 CPRE 585 Term Review Performance evaluation, ISA design, dynamically scheduled pipeline, and memory hierarchy.
Dan Ernst – ISCA-30 – 6/10/03 Advanced Computer Architecture Lab The University of Michigan Cyclone: A Low-Complexity Broadcast-Free Dynamic Instruction.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
CS 258 Spring The Expandable Split Window Paradigm for Exploiting Fine- Grain Parallelism Manoj Franklin and Gurindar S. Sohi Presented by Allen.
Lecture 1: Introduction Instruction Level Parallelism & Processor Architectures.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
IA-64 Architecture Muammer YÜZÜGÜLDÜ CMPE /12/2004.
Value Prediction Kyaw Kyaw, Min Pan Final Project.
Memory Protection through Dynamic Access Control Kun Zhang, Tao Zhang and Santosh Pande College of Computing Georgia Institute of Technology.
Chapter Six.
CS 352H: Computer Systems Architecture
Exploring Value Prediction with the EVES predictor
Douglas Lacy & Daniel LeCheminant CS 252 December 10, 2003
EE 382N Guest Lecture Wish Branches
Address-Value Delta (AVD) Prediction
Yingmin Li Ting Yan Qi Zhao
Phase Capture and Prediction with Applications
Chapter Six.
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Sampoorani, Sivakumar and Joshua
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
rePLay: A Hardware Framework for Dynamic Optimization
Lois Orosa, Rodolfo Azevedo and Onur Mutlu
Phase based adaptive Branch predictor: Seeing the forest for the trees
Project Guidelines Prof. Eric Rotenberg.
Presentation transcript:

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson and Todd Austin Advanced Computer Architecture Lab University of Michigan December 13, 2000

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Talk Overview Motivation Optimizations –Adding confidence to compiler controlled value prediction –Selecting value prediction candidates based on the time spent in the middle of dynamic dependence chains –Ignoring branch mispredictions when program semantics aren’t violated Results Conclusion and Future Work

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Motivation Value prediction can break dependence chains, increasing instruction level parallelism (ILP). Hardware value prediction techniques are effective but expensive due to the large amount of predictor state that needs to be stored. Software value prediction is inefficient because there is no confidence. Only the most predictable candidates can be selected, reducing coverage.

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Solution Add confidence to compiler-controlled value prediction by using the existing branch predictor –Increases coverage while keeping accuracy high Profile-guided candidate selection based on criticality of instructions in dynamic instruction window –Exposes additional parallelism Implement a new branch that ignores mispredictions when executing a non-optimized, but correct, path –Reduces branch mispredictions

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler Controlled Value Prediction Hardware support –Uses a hardware predictor accessed through PREDICT and UPDATE instructions [Fu98] –Value prediction table is smaller than hardware only value prediction –Sophisticated value predictors (such as a hybrid predictor) can be used Software only –Predictor state stored in memory or the register file –No hardware cost, can be added to existing microarchitectures –Only simple predictors make sense due to implementation overheads

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Optimization #1: Branch Based Confidence lda load add sub lda load predict sub add dependence predicted by beq instruction

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Coverage and Accuracy

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Optimization #2: Value Prediction Candidate Selection Split long dependence chains in half to maximize ILP –“ sub ” is the best instruction to predict in the example Dynamic profiler locates instructions in the longest chains –Compiler only optimizes sites with high “middle metric” –Largest number of optimization sites: 136 (approx. 1 KB table)

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Optimization #3: Misprediction Tolerant Branch “BEQIT” Unoptimized path is always correct regardless of confidence prediction Once the program has speculated down the unoptimized path, control will continue down this path, eliminating the branch misprediction penalty confidence predictor

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Experimental Framework Simulation and dynamic profiling using SimpleScalar –Alpha instruction set –4-wide dynamically scheduled baseline microarchitecture SPEC benchmarks analyzed –training inputs used for profiling –reference inputs used to generate performance results ALTO (A Link Time Optimizer) used to modify binaries [Muth98] –ALTO specific optimizations disabled

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Compiler Controlled Value Prediction with Branch Predictor Based Confidence

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Software-only Value Prediction

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Value Prediction Candidate Selection

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Misprediction Tolerant Branches

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Conclusion Branch predictor based confidence improves coverage and performance for compiler-controlled value prediction Software-only value prediction exhibits modest speedup, significant improvements possible with minimal hardware support to reduce overheads Selecting value prediction candidates based on criticality gives more performance, but only programs with long dynamic dependence chains A simple branch optimization eliminates penalties for innocuous confidence mispredictions

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Future Work Examine the “spacing” of optimizations. Candidates that are close together will cause the dependent chain to be split into several smaller chains, increasing the overhead. Look at using the branch predictor to estimate confidence for other forms of speculation. Apply the misprediction tolerant branch to other optimizations.

Advanced Computer Architecture Lab University of Michigan Compiler Controlled Value Prediction with Branch Predictor Based Confidence Eric Larson Questions and Answers