Presentation is loading. Please wait.

Presentation is loading. Please wait.

Operation Tables for Scheduling in the presence of Partial Bypassing Aviral Shrivastava 1 Eugene Earlie 2 Nikil Dutt 1 Alex Nicolau 1 1 Center For Embedded.

Similar presentations


Presentation on theme: "Operation Tables for Scheduling in the presence of Partial Bypassing Aviral Shrivastava 1 Eugene Earlie 2 Nikil Dutt 1 Alex Nicolau 1 1 Center For Embedded."— Presentation transcript:

1 Operation Tables for Scheduling in the presence of Partial Bypassing Aviral Shrivastava 1 Eugene Earlie 2 Nikil Dutt 1 Alex Nicolau 1 1 Center For Embedded Computer Systems, University of California, Irvine, CA, USA 2 Strategic CAD Labs, Intel, Hudson, MA, USA S CL

2 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 2 Bypassing Improves Performance Pipelining improves performance Pipelining improves performance Limited by pipeline hazards Bypasses eliminate certain data hazards Bypasses eliminate certain data hazards Further improve performance FD RF R1  R2 + R3R4  R4 + R1 R1 FD OR X1 RF X2 WB R1  R2 + R3R4  R4 + R1 R1 OR X1 X2 WB

3 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 3 Impact of Bypassing Cycle time Cycle time Bypasses may be a part of timing-critical path FDX1 RFX2 WB M1 M2 Area and Power consumption Area and Power consumption Wide multiplexers Bypass Control logic Bypass wires Wiring congestion Wiring congestion Overall chip complexity Overall chip complexity deeply pipelined out-of-order processors P. Ahuja et al., The Performance Impact of incomplete bypassing in processor pipelines MICRO 1995 A. Abnous and N. Bagerzadeh, Pipelining and bypassing in a VLIW processor, IEEE Trans... 1995.

4 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 4 Bypassing in Embedded Systems Bypassing increases performance Bypassing increases performance But may have significant impact on Area, Power Consumption, Wire congestion etc.. The Embedded Systems Dilemma The Embedded Systems Dilemma No Bypassing - Too low performance Full Bypassing - Too much area, power, wire congestion How to customize Bypassing? How to customize Bypassing?

5 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 5 Partial Bypassing – Solution and Problem Solution – Solution – Only the most beneficial bypasses are present Implements a trade-off between Performance, Area, Power consumption, etc.. of the processor FDORX1 RF X2 WB Problem – Problem – How to Compile for a processor with partial bypassing?

6 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 6 Related Work Compilation for partial bypassing Compilation for partial bypassing P. Ahuja et al. [MICRO’95] Manual Compilation Manual Compilation M. Buss et al. [CASES’01] Optimize inter-cluster copy operations Optimize inter-cluster copy operations K. Fan et al. [ASSP’03] FU-allocation strategy for VLIW processors FU-allocation strategy for VLIW processors No existing generic compilation technique No existing generic compilation technique RISC, superscalar, superpipline No instruction reordering No instruction reordering No accurate “pipeline hazard detection” technique We present : An We present : An accurate, generic, retargetable pipeline hazard detection technique

7 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 7 Pipeline Hazards Resource Hazards Resource Hazards F D OR X1 RF X2 WB C1C2 C3 Cycle Busy Resources MUL 1F 2D 3OR, C1, C2 4X1 5 6X2 7WB, C3 8 Data Hazards Data Hazards Resource Hazards – Structural Information Reservation Tables

8 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 8 Resource Hazard Detection F D OR X1 RF X2 WB C1C2 C3 Cycle Busy Resources MUL ADD 1F 2D F 3OR, C1, C2 D 4X1 OR, C1, C2 5 X1 RH 6X2 X1 7WB, C3 X2 8 WB, C3 Resource Hazard

9 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 9 de Data Hazard Detection Control Flow Graph – Register Information Operation Latency Least delay (in cycles) by which dependent operations must be separated to avoid data hazard a b c e f Control Flow Graph with operation latencies Scheduled operations 1 2 3 4 Time d a bc e f d 1 2 2 21

10 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 10 Traditional - Operation Latency FDORX1 RF X2 WB R1  R2 + R3R4  R4 + R1 R1 Operation Latency of a non-bypassed or fully bypassed pipeline is a Operation Latency of a non-bypassed or fully bypassed pipeline is a constant F DORX1 RF X2 WB R1  R2 + R3R4  R4 + R1 R1 No Bypassing: Operation Latency = 3 Full Bypassing: Operation Latency = 1

11 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 11 Partial Bypasses - Operation Latency FDORX1 RF X2 WB R1  R2 + R3R4  R4 + R1 Partial Bypassing: Operation Latency = ?? X3 Operation Latency ill-defined Operation Latency ill-defined Delay (in cycles) depends on the structure Delay (in cycles) depends on the structure Processor pipeline Presence/absence of bypasses Need structural information to detect data hazards Need structural information to detect data hazards

12 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 12 Partial Bypassing - Pipeline Hazards Traditionally (No or Full Bypassing) Traditionally (No or Full Bypassing) Resource Hazards - Structural information Data Hazards - Register information + Operation Latency Partial Bypassing Partial Bypassing Resource Hazards - Structural information Data Hazards - Register information + Structural information Structural information captured by Reservation Tables Structural information captured by Reservation Tables Augment Reservation Tables with register information Augment Reservation Tables with register information Our Contribution - Operation Table

13 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 13 Reservation Table Reservation Table Reservation Table for ADD Reservation Table is a binding between Reservation Table is a binding between Operation and processor resources Does not support multiple datapaths Does not support multiple datapaths 1. F 2. D 3. OR C1 RF C2 RF 4. X1 5. X2 6. WB C3 RF FDORX1 RF X2 WB C1C2 C3

14 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 14 Enhanced Reservation Table Enhanced Reservation Table Reservation Table for ADD Reservation Table is a binding between Reservation Table is a binding between Operation and processor resources 1. F 2. D 3. OR C1 RF C2 RF C5 BRF 4. X1 C4 BRF 5. X2 6. WB C3 RF FDORX1 RF X2 WB C1C2 C3 BRF C4 C5

15 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 15 Operation Table Operation Table Operation Table for ADD R1 R2 R3 FDORX1 RF X2 WB C1C2 C3 BRF C4 C5 Operation Table is a binding between Operation Table is a binding between Operation and Processor Resources and Registers Can be used to detect both data and resource hazards 1. F 2. D 3. OR ReadOperands R2 C1 RF R3 C2 RF C5 BRF DestOperands R1 RF 4. X1 WriteOperands R1 C4 BRF 5. X2 6. WB WriteOperands R1 C3 RF

16 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 16 Pipeline Hazard Detection using OT FD OR X1 RF X2 WB C1C2 C3 BRF C4 C5 Cycle Busy Resources !RFBRF MUL R1 R2 R3 1F-- 2D-- 3OR, C1, C2-- 4X1R1- 5X1, C4R1R1 6X2R1- 7WB, C3-- 8 -- 9 -- 10 -- 11 --

17 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 17 Resource Hazard Detection FD OR X1 RF X2 WB C1C2 C3 BRF C4 C5 Cycle Busy Resources !RFBRF MUL R1 R2 R3 ADD R4 R2 R3 1F-- 2D F-- 3OR, C1, C2 D-- 4X1 OR, C1, C2R1- 5 X1, C4 RH R1, R4 R1 6X2 X1, C4 R1, R4 R4 7WB, C3 X2R4- 8 WB, C3-- 9 -- 10 -- 11 -- Resource Hazard

18 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 18 Data Hazard Detection FD OR X1 RF X2 WB C1C2 C3 BRF C4 C5 Cycle Busy Resources !RFBRF MUL R1 R2 R3 ADD R4 R2 R3 SUB R5 R4 R2 1F-- 2D F-- 3OR, C1, C2 D F-- 4X1 OR, C1, C2 DR1- 5 X1, C4 RH DH R1, R4 R1 6 X2 X1, C4 DH R1, R4 R4 7 WB, C3 X2 DHR4- 8 WB, C3 OR, C1, C2-- 9 X1, C4R5R5 10 X2R5- 11 WB, C3-- Data Hazard

19 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 19 Scheduling using Operation Tables Operation Tables provide a way to accurately detect pipeline hazards Operation Tables provide a way to accurately detect pipeline hazards detect data and resource hazards Most scheduling algorithms have two main components Most scheduling algorithms have two main components Generate possible reorderings Evaulate each to find the best one. Most Scheduling algorithms should be able to leverage from a better evaluation mechanism Most Scheduling algorithms should be able to leverage from a better evaluation mechanism

20 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 20 Experimental Setup Intel XScale Microarchitecture Programmers Reference Manual, Intel XScale Microarchitecture Programmers Reference Manual, http://www.developer.intel.comhttp://www.developer.intel.com M. R. Gauthus et al. MiBench: A free commercially representative…, IEEE Workshop… 2001 Platform – Intel XScale Platform – Intel XScale 7-stage super-pipelined RISC Benchmarks – MiBench Benchmarks – MiBench Scheduler Scheduler instruction reordering within Basic Block Currently a post pass in the compiler Application gcc –O3 Executable Cycle Accurate Simulator GCC Cycles OT Cycles Performance Improvement = (GCC Cycles – OT Cycles)/GCC Cycles OT – based Scheduler

21 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 21 Up to 20% Performance Improvement Performance Improvement = (GCC Cycles – OT Cycles)/GCC Cycles

22 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 22 Summary Bypassing improves performance but is costly in terms of area, power etc.. Bypassing improves performance but is costly in terms of area, power etc.. Partial bypassing presents valuable trade-offs, however poses challenges in compilation Partial bypassing presents valuable trade-offs, however poses challenges in compilation Operation latencies in a partially bypassed pipeline are ill-defined We define Operation Table (OT) as a binding between an operation and the processors resources and registers We define Operation Table (OT) as a binding between an operation and the processors resources and registers OTs can be used to accurately detect hazards even in the presence of partial bypassing in processors OTs can be used to accurately detect hazards even in the presence of partial bypassing in processors OT based simple Basic Block level scheduling results in up to 20% performance improvement OT based simple Basic Block level scheduling results in up to 20% performance improvement

23 Copyright © 2004 UCI ACES Laboratory CODES-ISSS Sep 10, 2004 23 Thank You! Questions/Comments? aviral@ics.uci.edu


Download ppt "Operation Tables for Scheduling in the presence of Partial Bypassing Aviral Shrivastava 1 Eugene Earlie 2 Nikil Dutt 1 Alex Nicolau 1 1 Center For Embedded."

Similar presentations


Ads by Google