Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.

Similar presentations


Presentation on theme: "Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010."— Presentation transcript:

1 Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010

2 2 No single architecture solves all power problems Hard -wired proxy General Purpose Processor 100 X Software Programmable DSP Industry has debated merits of each architecture for decades… Combination of all approaches optimizes power and performance 10 X

3 Retargetable Compilation Why ? Rocket – C compiler, written in C++ – Retargetable for ILP computers – Single machine description file – Development 1989-2000 Gnu

4 Hybrid Computing Heterogeneous processors on single chip – “CPU” – FPGA – ASIC – N “CPU”s, M FPGAs, K ASICs Tradeoffs of performance, power, flexibility

5 CPU 1 CPU 2 CPU m Multi-CPU FPGA 1 FPGA 2 FPGA n Multi-FPGA Shared Memory Generic Hybrid Architecture

6 System Specification Partitioning CPU Compiler FPGA Synthesis CPU Power-Performance Model FPGA Power-Performance Model Source Code Generic Hy-C Tools Optimization Control Objectives/Constraints

7 Intermediate Representations 3-address form Control flow graph SSA --- static single assignment

8 Control Flow Graph Nodes are Basic Blocks – Single entry, single exit – No branch exempt (possibly) at bottom Edges represent one possible flow of execution between two basic blocks Whole CFG represents a function

9 1/26/20169 Static Single Assignment SSA: A program is in SSA form iff – Each variable is statically defined exactly only once, and – Each use of a variable is dominated by that variable’s definition.

10 1/26/201610 Example In general, how to transform an arbitrary program into SSA form? Does the definition of X 2 dominates its use in the example? X1X1 X 2 = X 4 = X 3 = (X 1, X 2 ) =

11 1/26/201611 SSA: Motivation Provide a uniform basis of an IR to solve a wide range of classical dataflow problems Encode both dataflow and control flow information A SSA form can be constructed and maintained efficiently Its popular Gcc uses SSA

12 Software Pipelining Schedule operations from multiple iterations of a loop in parallel Hides latency Compiler “reorders” loop code to include: – Prelude – Kernel – Postlude

13 Software Pipeline Benefit for “Typical” Architecture and MMult “Typical” Architecture – 8-wide Instruction-Level Parallel (ILP) Assuming 3000 x 3000 matrices – Original requires 45 million cycles – Pipelined version requires 3 million + 15

14 Current Compiler Projects Hy-C – Build tools – Partition algorithms – Retargetability and constraint specification – OMAP project Thread-level parallelism in imperative code – Limit study – Improved identification of threads Fast compiler-controlled memory

15 15 Application Imaging Video Audio OMAP4 Sub-System Encapsulation

16 Chiron Tesla Ducati Multi-CPU Shared Memory OMAP Resources

17 OMAP Processor Resources Chiron – 2 x 600 MHz (2 symmetric processors each at 600 MHz with shared L2) – Power 600uW / MHz Tesla – DSP Sub-System (C64x derivative); 400 MHz, 8-wide ILP – Power 200uW / MHz Ducati – 200 MHz (targeted for control, low latency code) – Power 100uW / MHz

18 System Specification Partitioning Veyron Ducati Source Code Hy-C for OMAP Optimization Control Objectives/Constraints Tesla

19 OMAP Project, Current State Use gcc to generate “readable” SSA graphs for C programs Developing translator to convert SSA graphs to Hy-C internal Control, Data Dependence Graphs (CDDGs). Translator to Hy-C CDDGs successfully tested on small C programs 1/26/2016

20

21 Partition Algorithm Examine Control Flow Graph (CFG) for a function – Identify software pipelining possibility – Build Dependence Graph (combining data and control dependence) Choose one of three resources for the function

22 Partition Algorithm (cont.) If software pipelining profitable, place function on C64 DSP resource Else examine Dependence Graph – if ( number of nodes / critical path length ) > 1.5, place on double-issue ARM – else place on single-issue ARM

23

24 Long-Term Future Automatic Code Generation (I don’t believe in software) Visual Programming of Components


Download ppt "Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010."

Similar presentations


Ads by Google