Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Michigan Electrical Engineering and Computer Science 1 Increasing Hardware Efficiency with Multifunction Loop Accelerators Kevin Fan, Manjunath.

Similar presentations


Presentation on theme: "University of Michigan Electrical Engineering and Computer Science 1 Increasing Hardware Efficiency with Multifunction Loop Accelerators Kevin Fan, Manjunath."— Presentation transcript:

1 University of Michigan Electrical Engineering and Computer Science 1 Increasing Hardware Efficiency with Multifunction Loop Accelerators Kevin Fan, Manjunath Kudlur, Hyunchul Park, Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan October 25, 2006

2 University of Michigan Electrical Engineering and Computer Science 2 Introduction Emerging applications have high performance, cost, energy demands –H.264, wireless, software radio, signal processing –10-100 Gops required –200 mW power budget Applications dominated by tight loops processing large amounts of streaming data CPU Accelerators

3 University of Michigan Electrical Engineering and Computer Science 3 Loop Accelerators Order-of-magnitude performance and efficiency wins –Viterbi: 100x speedup vs. ARM9.C Automated C  gates solution Correct by construction Close designer productivity gap Achieve short time-to-market

4 University of Michigan Electrical Engineering and Computer Science 4 Prescribed Throughput Accelerators Traditional behavioral synthesis –Directly translate C operators into gates Operation graphDatapath ApplicationArchitecture Our approach: Application-centric Architectures –Achieve fixed throughput –Maximize hardware sharing

5 University of Michigan Electrical Engineering and Computer Science 5 Outline Loop accelerator schema and design flow Cost sensitive scheduling Designing multifunction accelerators –Naïve –Joint scheduling –Datapath union Synthesis results

6 University of Michigan Electrical Engineering and Computer Science 6 Loop Accelerator Template Parameterized execution resources, storage, connectivity Hardware realization of modulo scheduled loop

7 University of Michigan Electrical Engineering and Computer Science 7 Loop Accelerator Design Flow FU Alloc.c C Code, Performance (Throughput) Abstract Arch Modulo Schedule Op1 Op2 Op3 … time FUs Scheduled Ops RF FU Build Datapath Concrete Arch FU Instantiate Arch Synthesize Verilog, Control Signals.v Loop Accelerator

8 University of Michigan Electrical Engineering and Computer Science 8 Datapath Derived from Schedule Schedule to abstract architecture (FUs) Determine register and interconnect requirements from schedule r1 = Mem[r2] r3 = r1 + 12 Source Code Datapath MEM+ 12 time 1 time 4 FU1FU2 Schedule... ADD LOAD

9 University of Michigan Electrical Engineering and Computer Science 9 Cost Sensitive Scheduling 27% cost reduction with same performance [MICRO ’05] +1+1 LD 1 +1+1 +2+2 LD 2 +2+2 time FU1FU2FU3 FU1FU2FU3 0 1 2 +1+1 +2+2 LD 2 LD 1 time FU1FU2FU3 FU1FU2FU3 0 1 2 Traditional scheduling is hardware unaware Intelligent scheduling needed to reduce hardware cost

10 University of Michigan Electrical Engineering and Computer Science 10 LA1 LA2 LA4 Accelerator Pipeline Loop Accelerator LA3 LA5 Multifunction Accelerator Map multiple loops to single accelerator Improve hardware efficiency via reuse Opportunities for sharing –Disjoint stages (loops 2, 3) –Pipeline slack (loops 4, 5) Frame Type? Loop 2Loop 3 Loop 1 Loop 4 Application … Block 5 LA1 LA2 LA3 Accelerator Pipeline … Loop Accelerator Multifunction Loop Accelerator Multifunction Loop Accelerator

11 University of Michigan Electrical Engineering and Computer Science 11 Design Strategies Naïve method: Design single function accelerators, place side by side –Misses potential hardware sharing of FUs, storage, interconnect Loop 1 Loop 2 Cost Sensitive Modulo Scheduler FU Multifunction datapath

12 University of Michigan Electrical Engineering and Computer Science 12 Joint Scheduling Loops are independent: # possible schedules exponential in # of loops! Infeasible for modest problems Loop 1 Loop 2 Joint Cost Sensitive Modulo Scheduler Op1 Op2 Op3 … time FUs Op2 Op1 … Op3 time FUs FU

13 University of Michigan Electrical Engineering and Computer Science 13 Multifunction Gate Costs 43% average savings over sum of accelerators ABCDEFGHIJ

14 University of Michigan Electrical Engineering and Computer Science 14 Datapath Union Loop 1 Loop 2 Cost Sensitive Modulo Scheduler FU Datapath Union

15 University of Michigan Electrical Engineering and Computer Science 15 Datapath Union Combine similar components → better hardware sharing → lower cost Trade off FU and register cost –Combining dissimilar FUs can enable register cost savings ILP formulation minimizes FU and register cost Accel 1 Accel 2 +-MM + +*M+*/-MM/+ Multi- function accel ++/-M/*M

16 University of Michigan Electrical Engineering and Computer Science 16 Multifunction Gate Costs Smart union within 3% of joint scheduling solution ABCDEFGHIJ

17 University of Michigan Electrical Engineering and Computer Science 17 Conclusion Multifunction accelerators highly effective in exploiting coarse grained hardware sharing Joint scheduling achieves 43% average cost savings, but is impractical Smart union of independent accelerators achieves 40% average savings Compile times of 5 minutes – 1 hour

18 University of Michigan Electrical Engineering and Computer Science 18 Questions?


Download ppt "University of Michigan Electrical Engineering and Computer Science 1 Increasing Hardware Efficiency with Multifunction Loop Accelerators Kevin Fan, Manjunath."

Similar presentations


Ads by Google