Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ph.D. in Computer Science

Similar presentations


Presentation on theme: "Ph.D. in Computer Science"— Presentation transcript:

1 Ph.D. in Computer Science
School of Computing, Informatics, and Decision Systems Engineering Compiler and Architecture Design for Coarse-Grained Programmable Accelerators Mahdi Hamzeh June 26, 2015  

2 Trends in Silicon Computing
Heterogeneity Multi-cores Multi-cores Multi-threading Multi-threading Multi-threading μ-architecture μ-architecture μ-architecture μ-architecture Technology Technology Technology Technology Technology 6/26/15 Compiler and Architecture Design for CGRAs

3 Why Heterogonous Computing?
Efficient Resource Allocation Based on Run-Time Info Each exhibit interesting feature for a class of computation Applications execute in phases Phase: a different class of computation A significant silicon area will be dark 1 Power GPU FPGA HP Core LP Core DSP HW ACC Performance LP Core: Low power in-order general-purpose core HP Core: High-performance out-of-order general-purpose core HW ACC: Hardware accelerator 6/26/15 Compiler and Architecture Design for CGRAs

4 HW Accelerators are Expensive!
High design, test, verification cost HW ACC and FPGA Engineering cost Time to market HW ACC System Design Cost FPGA GPU DSP Building specialized HW ACC is expensive and time consuming HP Core LP Core Performance LP Core: Low power in-order general-purpose core HP Core: High-performance out-of-order general-purpose core HW ACC: Hardware accelerator 6/26/15 Compiler and Architecture Design for CGRAs

5 HW Accelerators: Low Utilization, Limited Programmability
Specialized for one application HW ACC Specialized for a class of computation DSP, GPU Run-time configuration overhead FPGA LP Core HP Core Flexibility FPGA GPU DSP HW ACC is only do well in one app, cannot use it in other app even if close computation class phase HW ACC Performance LP Core: Low power in-order general-purpose core HP Core: High-performance out-of-order general-purpose core HW ACC: Hardware accelerator 6/26/15 Compiler and Architecture Design for CGRAs

6 Software Programmable Accelerators: Opportunities and Challenges
Programmability Compiler support: drives down costs HW ACC DSP GPU FPGA Performance Flexibility HP Core LP Core System Design Cost HP Core LP Core DSP HW ACC GPU FPGA Performance SW ACC SW acc to close cost gap SW ACC 6/26/15 Compiler and Architecture Design for CGRAs

7 Coarse-Grained Reconfigurable Architectures
6/26/15 Compiler and Architecture Design for CGRAs

8 CGRA Designs in Literature
ADRES 60 GOPS/w 6/26/15 Compiler and Architecture Design for CGRAs

9 CGRA Designs in Literature
TilePro64 192 6/26/15 Compiler and Architecture Design for CGRAs

10 Problems Addressed in this Dissertation
CGRA Compiler Problems Problem Definition Complexity Analysis Contribution CGRA Design What I did in this dissertation CGRA System Integration 6/26/15 Compiler and Architecture Design for CGRAs

11 CGRA accelerates loops using modulo scheduling
Execution Trace Target Application Specified in C Serial region Prolog Repetitive region Loop Serial region Epilog 6/26/15 Compiler and Architecture Design for CGRAs

12 II is the performance metric
Modulo Scheduling Time 4 b 1 2 3 4 1 2 3 4 a 2 a a a a b b b b b b 1 2 3 4 1 2 3 4 c d 1 2 3 4 1 b II is the performance metric c c c c d d d d 1 2 3 4 1 2 3 4 f f f f e e e e 2 g g g g 1 2 3 4 1 2 3 4 3 6/26/15 Compiler and Architecture Design for CGRAs

13 CGRA Modulo Scheduling: Problem Definition
Define what a right mapping is. Map ops to subset of resources. Every data dependency is mapped to a path under certain conditions, II is minimized 6/26/15 Compiler and Architecture Design for CGRAs

14 CGRA Modulo Scheduling: Problem Definition
Define what a right mapping is. Map ops to subset of resources. Every data dependency is mapped to a path under certain conditions, II is minimized 6/26/15 Compiler and Architecture Design for CGRAs

15 Compiler and Architecture Design for CGRAs
Problem Definition Important characteristics Routing, re-computing, or both EPIMorphism between computation graph and resource graph Identified the list of necessary conditions scheduled computation graph should hold Mapping is NP-Complete 3-partition problem 6/26/15 Compiler and Architecture Design for CGRAs

16 Problems Addressed in This Dissertation
Problem Definition Complexity Analysis CGRA Compiler Problems Mapping Algorithm Contribution CGRA Design What I did in this dissertation CGRA System Integration 6/26/15 Compiler and Architecture Design for CGRAs

17 CGRA Modulo Scheduling Policies
Brute Force Edge Centric Integrated Methods Node Centric Modulo Scheduling Policies Nature Inspired Existing literature addressing this problem using following policies Partitioning Decomposition methods Nature Inspired 6/26/15 Compiler and Architecture Design for CGRAs

18 Assumption and Limitations
Memory miss, stop the execution A ld/st queue to resolve memory dependencies Support only single assignment instructions No system call No Function call Single exit condition 6/26/15 Compiler and Architecture Design for CGRAs

19 Compiler and Architecture Design for CGRAs
EPIMap Decomposition Scheduling Placement Constructive Evolve computation graph based on resource graph Adjust resource graph (MII) Efficient placement How we address it. Why we do it better? 6/26/15 Compiler and Architecture Design for CGRAs

20 Compiler and Architecture Design for CGRAs
EPIMap notable features and policies 6/26/15 Compiler and Architecture Design for CGRAs

21 Compiler and Architecture Design for CGRAs
Re-Scheduling 6/26/15 Compiler and Architecture Design for CGRAs

22 Resource Allocation Problem
6/26/15 Compiler and Architecture Design for CGRAs

23 Resource Allocation: Supporting Multi-cycle Operation
6/26/15 Compiler and Architecture Design for CGRAs

24 Resource Allocation: Supporting Pipelined Resources
f 6/26/15 Compiler and Architecture Design for CGRAs

25 Compiler and Architecture Design for CGRAs
Register Allocation 6/26/15 Compiler and Architecture Design for CGRAs

26 Compiler and Architecture Design for CGRAs
Register Allocation 6/26/15 Compiler and Architecture Design for CGRAs

27 Rotating and Non-Rotating Register Files
6/26/15 Compiler and Architecture Design for CGRAs

28 Problems Addressed in This Dissertation
Problem Definition Complexity Analysis CGRA Compiler Problems Mapping Algorithm Contribution CGRA Design What I did in this dissertation Control Flow Acceleration CGRA System Integration 6/26/15 Compiler and Architecture Design for CGRAs

29 Control Flow Acceleration
6/26/15 Compiler and Architecture Design for CGRAs

30 Compiler and Architecture Design for CGRAs
Partial Predication 3 a b c f e h et ef a b a b h a b h et ef c h a b c e f 6/26/15 Compiler and Architecture Design for CGRAs

31 Compiler and Architecture Design for CGRAs
Full Predication b h a 4 a a b c f e h b h b h a c e b e e b a c e f 6/26/15 Compiler and Architecture Design for CGRAs

32 Compiler and Architecture Design for CGRAs
Dual-Issue a b c f e h et ef a b c f h e 6/26/15 Compiler and Architecture Design for CGRAs

33 Mapping with Dual-Issue
2 b a b c f h e a b h a b c e f 6/26/15 Compiler and Architecture Design for CGRAs

34 Compiler and Architecture Design for CGRAs
Hardware Support 6/26/15 Compiler and Architecture Design for CGRAs

35 Compiler and Architecture Design for CGRAs
CGRA Compiler Flow 6/26/15 Compiler and Architecture Design for CGRAs

36 State-of-the-art before EPIMap/REGIMap
DRESC: A simulated annealing based mapping algorithm Integrated Mapping policy Supports multi-cycle operations Supports pipelined PEs Extended with register allocation Has been shown to generate mapping better than other mapping algorithms 6/26/15 Compiler and Architecture Design for CGRAs

37 Compiler and Architecture Design for CGRAs
EPIMap DRESC: Simulated annealing based MII= Min (ResMII, RecMII) 4 X 4 CGRA Mesh interconnect 1 cycle latency 6/26/15 Compiler and Architecture Design for CGRAs

38 Mapping and Register Allocation-Single Cycle
6/26/15 Compiler and Architecture Design for CGRAs

39 Mapping and Register Allocation-Single Cycle
6/26/15 Compiler and Architecture Design for CGRAs

40 Mapping and Register Allocation-Single Cycle
6/26/15 Compiler and Architecture Design for CGRAs

41 Mapping and Register Allocation-Pipelined PEs
6/26/15 Compiler and Architecture Design for CGRAs

42 Mapping and Register Allocation-Pipelined PEs
6/26/15 Compiler and Architecture Design for CGRAs

43 Summary of EPIMap/REGIMap vs. DRESC
Performance Ratio Compilation Time Ratio Single cycle (NO-RA) 1.31X 138X Single cycle – 2 Regs 1.73X 240X Single cycle - 4 Regs 1.6X 209X Single cycle - 8 Regs 1.5X 163X Pipelined (NO-RA) 1.45X 192X Pipelined- 2 Regs 1.83X 317X Pipelined- 4 Regs 1.81X 289X Pipelined- 8 Regs 1.68X 227X 6/26/15 Compiler and Architecture Design for CGRAs

44 Mapping Loops With Conditional Instructions
6/26/15 Compiler and Architecture Design for CGRAs

45 CGRA Research Framework
6/26/15 Compiler and Architecture Design for CGRAs

46 Compiler and Architecture Design for CGRAs
6/26/15 Compiler and Architecture Design for CGRAs

47 Compiler and Architecture Design for CGRAs
Summary Problem definition Supports routing Re-computation Complexity analysis Reduction from 3-partition problem Counter intuitive discovery, re-computation can improve performance Computation graph and necessary conditions EPIMap Approximate II progressively Effective iterative scheduling algorithm 6/26/15 Compiler and Architecture Design for CGRAs

48 Compiler and Architecture Design for CGRAs
Summary Placement problem formulation Support of multi-cycle operations Support of pipelined resources Constructive method REGIMap Integrated placement and register allocation Support of conditionals Full predication Partial predication Dual-issue Integration with llvm compiler framework 6/26/15 Compiler and Architecture Design for CGRAs

49 Compiler and Architecture Design for CGRAs
Summary CGRA design ISA Rotating and non-rotating register files Dual-issue support RTL implementation and synthesis CGRA simulation framework CGRA model in gem5 6/26/15 Compiler and Architecture Design for CGRAs

50 Compiler and Architecture Design for CGRAs
Future Directions Support of system call Mapping with memory optimization Software prefetching in mapping Just-in-time compilation of kernels Offload decision at run-time Speculative execution support for CGRAs 6/26/15 Compiler and Architecture Design for CGRAs

51 Compiler and Architecture Design for CGRAs
Backup 6/26/15 Compiler and Architecture Design for CGRAs

52 Backup-Scheduling Success
6/26/15 Compiler and Architecture Design for CGRAs

53 Clique-Resource Allocation Attempts
6/26/15 Compiler and Architecture Design for CGRAs

54 Compiler and Architecture Design for CGRAs
Step by Step Example 6/26/15 Compiler and Architecture Design for CGRAs

55 Compiler and Architecture Design for CGRAs
Step by Step Example 6/26/15 Compiler and Architecture Design for CGRAs


Download ppt "Ph.D. in Computer Science"

Similar presentations


Ads by Google