Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors Madhavi Valluri and Lizy John Laboratory for Computer Architecture.

Slides:



Advertisements
Similar presentations
CSCI 4717/5717 Computer Architecture
Advertisements

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 3, 2005 Topic: Instruction-Level Parallelism (Dynamic Scheduling: Introduction)
HW 2 is out! Due 9/25!. CS 6290 Static Exploitation of ILP.
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture 5: Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2)
CPE 631: ILP, Static Exploitation Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Instruction Level Parallelism María Jesús Garzarán University of Illinois at Urbana-Champaign.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture VLIW Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
1 Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
1 Lecture 10: Static ILP Basics Topics: loop unrolling, static branch prediction, VLIW (Sections 4.1 – 4.4)
Superscalar Organization Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
Instruction-Level Parallelism (ILP)
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 11: ILP Innovations and SMT Today: out-of-order example, ILP innovations, SMT (Sections 3.5 and supplementary notes)
1 Lecture 5: Pipeline Wrap-up, Static ILP Basics Topics: loop unrolling, VLIW (Sections 2.1 – 2.2) Assignment 1 due at the start of class on Thursday.
1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
OOO execution © Avi Mendelson, 4/ MAMAS – Computer Architecture Lecture 7 – Out Of Order (OOO) Avi Mendelson Some of the slides were taken.
COMP381 by M. Hamdi 1 Commercial Superscalar and VLIW Processors.
Is Out-Of-Order Out Of Date ? IA-64’s parallel architecture will improve processor performance William S. Worley Jr., HP Labs Jerry Huck, IA-64 Architecture.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
1 Out-Of-Order Execution (part I) Alexander Titov 14 March 2015.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,
Csci 136 Computer Architecture II – Superscalar and Dynamic Pipelining Xiuzhen Cheng
Varun Mathur Mingwei Liu Sanghyun Park, Aviral Shrivastava and Yunheung Paek.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
On-chip Parallelism Alvin R. Lebeck CPS 220/ECE 252.
PipeliningPipelining Computer Architecture (Fall 2006)
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Use of Pipelining to Achieve CPI < 1
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
CS 352H: Computer Systems Architecture
Instruction Level Parallelism
/ Computer Architecture and Design
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue
Flow Path Model of Superscalars
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
Dr. Javier Navaridas Pipelining Dr. Javier Navaridas COMP25212 System Architecture.
Lecture: Static ILP Topics: loop unrolling, software pipelines (Sections C.5, 3.2) HW3 posted, due in a week.
Mattan Erez The University of Texas at Austin
CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue
Overview Prof. Eric Rotenberg
Mattan Erez The University of Texas at Austin
Adapted from the slides of Prof
CSC3050 – Computer Architecture
Dynamic Hardware Prediction
CMSC 611: Advanced Computer Architecture
Lecture 5: Pipeline Wrap-up, Static ILP
Presentation transcript:

Hybrid-Scheduling: A Compile-Time Approach for Energy–Efficient Superscalar Processors Madhavi Valluri and Lizy John Laboratory for Computer Architecture University of Texas at Austin Loop: { (1) add R1, R2, R5; (2) sub R5, R2, R1; (3) st R1, 0(R15); (4) ld R9, 8(R24); (5) add R4, R9, R10; } Compiler Schedule (case 1) Cycle 1 Instr (1) Cycle 2 Instr (2) Cycle 3 Instr (3) SIX CYCLES Cycle 4 Instr (4) PER ITERATION Cycle 5 nop Cycle 6 Instr (5) Dynamic Schedule Cycle 1 Instr (1) Instr (4) Cycle 2 Instr (2) nop Cycle 3 Instr (3) Instr (5) THREE CYCLES PER ITERATION Dynamic schedule is 2X better than compiler schedule Assumed operation latencies ALU – 1 cycle LD/ST – 2 cycles Compiler Schedule (case 2) Cycle 1 Instr (1) Instr (4) Cycle 2 Instr (2) nop Cycle 3 Instr (3) Instr (5) THREE CYCLES PER ITERATION CASE 1 : Assume R15 & r24 depend on input data; Potential address aliasing; cannot be resolved statically CASE 2: R15 and R24 are starting addresses of two arrays a & b; no aliasing problem exists Motivating Example Where does power go? Where does power go? Out-of-order issue logic* used to identify parallel instructions to issue each cycle consumes ~25-50% of processor power OoO Issue Logic Main Components: Issue Queue & Reorder Buffer - Accessed multiple times each cycle - Highly Associative - Multiple locations accessed - Large number of ports - Complex Select/Wakeup logic Alpha The Hybrid-Scheduling Approach The Hybrid-Scheduling Approach Preliminary Results (ISLPED 2003) Programs divided into low power static regions and high power dynamic region Code in S-Regions scheduled by compiler alone into packets of independent instructions Instruction packets issued to functional units without any dynamic dependence checks 2 modes of execution – Superscalar mode for dynamic regions + Static mode for S-Regions Large savings in static mode since OoO logic bypassed Benchmarks & S-Region Selection Media & SpecFP benchmarks Potential S-Regions - Loops without function calls Final S-Regions selected by profiling 30-99% time spent in S-Regions Comparing with Dynamic Resource Adaptation Schemes Resource requirements of program varies with program phase Dynamic resource adaptation schemes typically lower processor resources when program IPC is low and increase resources when IPC is high Hybrid-Scheduling scheme eliminates power-hungry resource use for high IPC (or high ILP) regions Future Research Directions Future Research Directions Experimental Results 30% average energy savings 3.6% average performance drop NO difference between dynamic schedule and compiler schedule! Decode Rename O-o-O Issue Reorder Buffer alu Low Power Reorder Buffer Special instruction to switch mode Program Dynamic Region Static Region a.k.a: Dynamic Issue Logic Fetch FPU IALU Caches dcache Rest OoO issue Exec Mem Fetch OoO issue OoO issue Rest IALU FPU MMU Dynamic scheduling hardware justified in irregular regions of the program In regular regions of programs (regions with well-structured control-flow, regular memory access patterns etc), hardware scheduling is redundant SOLUTION: Hybrid-Scheduling Approach The real challenge: Adapting Hybrid-scheduling for irregular, SPECint-like applications Minimal time spent in loops without function calls (previously explored type of S-Region) Ongoing research – Investigating alternate S-Regions – Hyperblocks, Superblocks and Traces