Presentation is loading. Please wait.

Presentation is loading. Please wait.

UCB November 8, 2001 Krishna V Palem Georgia Tech What Is a Compiler When The Architecture Is Not Hard(ware) ? Krishna V Palem This work was supported.

Similar presentations


Presentation on theme: "UCB November 8, 2001 Krishna V Palem Georgia Tech What Is a Compiler When The Architecture Is Not Hard(ware) ? Krishna V Palem This work was supported."— Presentation transcript:

1 UCB November 8, 2001 Krishna V Palem Georgia Tech What Is a Compiler When The Architecture Is Not Hard(ware) ? Krishna V Palem This work was supported in part by awards from Hewlett-Packard Corporation, IBM, Panasonic AVC, and by DARPA under Contract No. DABT63-96-C-0049 and Grant No. 25-74100-F0944. Portions of this presentation were given by the speaker as a keynote at the ACM LCTES2001 and as an invited speaker at EMSOFT01

2 2 UCB – November 8, 2001 Krishna V Palem Georgia Tech Embedded Computing Why ? What ? How ?

3 3 UCB – November 8, 2001 Krishna V Palem Georgia Tech The Nature of Embedded Systems Visible Computing View is of end application (Hidden computing element)

4 4 UCB – November 8, 2001 Krishna V Palem Georgia Tech  Supported by Moore’s (second) law –Computing power doubles every eighteen months Corollary: cost per unit of computing halves every eighteen months  From hundreds of millions to billions of units  Projected by market research firms (VDC) to be a 50 billion+ space over the next five years  High volume, relatively low per unit $ margin Favorable Trends

5 5 UCB – November 8, 2001 Krishna V Palem Georgia Tech Embedded Systems Desiderata  Low Power - High battery life  Small size or footprint  Real-time constraints Performance comparable to or surpassing leading edge COTS technology Rapid time-to-market

6 6 UCB – November 8, 2001 Krishna V Palem Georgia Tech Timing Example Predictable Timing Behavior Unpredictable Timing Behavior Video-On-Demand

7 7 UCB – November 8, 2001 Krishna V Palem Georgia Tech Current Art  Meet desiderata while overcoming NRE cost hurdles through volume  High migration inertia across applications  Long time to market Vertical application domains Industrial Automation Telecom Select computational kernels To ASIC

8 8 UCB – November 8, 2001 Krishna V Palem Georgia Tech Subtle but Sure Hurdles  For Moore’s corollary to be true –Non-recurring engineering (NRE) cost must be amortized over high-volume –Else prohibitively high per unit costs  Implies “uniform designs” over large workload classes –(Eg). Numerical, integer, signal processing  Demands of embedded systems –“Non uniform” or application specific designs –Per application volume might not be high –High NRE costs  infeasible cost/unit –Time to market pressure

9 9 UCB – November 8, 2001 Krishna V Palem Georgia Tech The Embedded Systems Challenge  Sustain Moore’s corollary –Keep NRE costs down Multiple application domains 3D Graphics Industrial Automation MedtronicsE-TextilesTelecom Rapidly changing application kernels in moderate volume Custom computing solution meeting constraints

10 10 UCB – November 8, 2001 Krishna V Palem Georgia Tech Responding Via Automation Multiple application domains 3D Graphics Industrial Automation MedtronicsE-TextilesTelecom Rapidly changing application kernels in low volume Automatic Tools PowerTiming Size Application specific design

11 11 UCB – November 8, 2001 Krishna V Palem Georgia Tech Three Active Approaches  Custom microprocessors  Architecture exploration and synthesis  Architecture assembly for reconfigurable computing

12 12 UCB – November 8, 2001 Krishna V Palem Georgia Tech Custom Processor Implementation  High performance implementation  Customized in silicon for particular application domain  O(months) of design time  Once designed, programmable like standard processors Proprietary Tools Proprietary ISA, Architecture Specification Application analysis Fabricate Processor Custom Processor implementation Application Language with custom extensions CompilerBinary Proprietary ISA Time Intensive Tensilica, HP-ST Microelectronics approach

13 13 UCB – November 8, 2001 Krishna V Palem Georgia Tech Architecture Exploration and Synthesis The PICO Vision Program In Automatic synthesis of application specific parallel / VLIW ULSI microprocessors And their compilers for embedded computing Chip Out “ Computer design for the masses” “A custom system architecture in 1 week tape-out in 4 weeks” B Ramakrishna Rau “The Era of Embedded Computing”, Invited talk, CASES 2000.

14 14 UCB – November 8, 2001 Krishna V Palem Georgia Tech Custom Microprocessors Application(s) define workload Optimizing Compiler Analyze Define ISA extension (eg) IA 64+ Define Compiler Optimizations Design Implementation Microprocessor (eg) Itanium +

15 15 UCB – November 8, 2001 Krishna V Palem Georgia Tech Application Specific Design Single Application Program Analysis Analyze Library of possible implementations (Bypass ISA) Explore and Synthesize implementations VLIW Core + Non programmable extension Applications Application specific processor runs single application Extended EPIC Compiler Technology

16 16 UCB – November 8, 2001 Krishna V Palem Georgia Tech The Compiler Optimization Trajectory Frontend and Optimizer Determine Dependences Determine Independences Bind Operations to Function Units Bind Transports to Busses Determine Dependences Bind Transports to Busses Execute Superscalar Dataflow Indep. Arch. VLIW TTA CompilerHardware Determine Independences Bind Operations to Function Units B. Ramakrishna Rau and Joseph A. Fisher. Instruction-level parallel: History overview, and perspective. The Journal of Supercomputing, 7(1-2):9-50, May 1993.

17 17 UCB – November 8, 2001 Krishna V Palem Georgia Tech What Is the Compiler’s Target ``ISA’’?  Target is a range of architectures and their building blocks  Compiler reaches into a constrained space of silicon  Explores architectural implementations  O(days – weeks) of design time  Exploration sensitive to application specific hardware modules  Fixed function silicon is the result  Verification NRE costs still there  One approach to overcoming time to market Frontend and Optimizer Determine Dependences Determine Independences Bind Operations to Function Units Bind Transports to Busses Determine Dependences Bind Transports to Busses Execute Superscalar Dataflow Indep. Arch. VLIW TTA CompilerHardware Determine Independences Bind Operations to Function Units B. Ramakrishna Rau and Joseph A. Fisher. Instruction-level parallel: History overview, and perspective. The Journal of Supercomputing, 7(1-2):9-50, May 1993.

18 18 UCB – November 8, 2001 Krishna V Palem Georgia Tech Choices of Silicon High level design/synthesis (EDIF) Netlist Fixed silicon implementation Standard cell design etc. Emulated i.e. Reconfigurable target

19 19 UCB – November 8, 2001 Krishna V Palem Georgia Tech Reconfigurable Computing

20 20 UCB – November 8, 2001 Krishna V Palem Georgia Tech FPGAs As an Alternative Choice for Customization  Frequent (re)configuration and hence frequent recustomization  Fabrication process is steadily improving  Gate densities are going up  Performance levels are acceptable  Amortize large NRE investments by using COTS platform

21 21 UCB – November 8, 2001 Krishna V Palem Georgia Tech Major Motivations  Poor compilation times –Lack of correspondence between standard IR and final configurations –Place and route inherently complex  Would like to have compile times of the order of seconds to minutes  Provide customization of data-path by automatic analysis and optimization in software

22 22 UCB – November 8, 2001 Krishna V Palem Georgia Tech Adaptive EPIC Adaptive Explicitly Parallel Instruction Computing Surendranath Talla Department of Computer Science, New York University PhD Thesis, May 2001 Janet Fabri award for outstanding dissertation. Adaptive Explicitly Parallel Instruction Computing Krishna V. Palem and Surendranath Talla, Courant Institute of Mathematical Sciences; Patrick W. Devaney, Panasonic AVC American Laboratories Inc. Proceedings of the 4th Australasian Computer Architecture Conference, Auckland, NZ. January 1999

23 23 UCB – November 8, 2001 Krishna V Palem Georgia Tech Compiler ISA ADD format semantics LD format semantics Compiler-Processor Interface Source program Executable Registers Exceptions..

24 24 UCB – November 8, 2001 Krishna V Palem Georgia Tech Compiler Redefining Processor-Compiler Interface Let compiler determine the instruction sets (and their realization on chip) Let compiler determine the instruction sets (and their realization on chip) ISA mysub format semantics myxor format semantics Executable Source program

25 25 UCB – November 8, 2001 Krishna V Palem Georgia Tech Record of execution EPIC execution model ILP Functional units

26 26 UCB – November 8, 2001 Krishna V Palem Georgia Tech Reconfigure datapath Adaptive EPIC execution model Record of execution ILP-2 ILP-1 Configured Functional units

27 27 UCB – November 8, 2001 Krishna V Palem Georgia Tech Adaptive EPIC Simple ASIC Complex ASIC RaPiD FPGA GARP DPGA SuperSpeculative RAW TRACE (Multiscalar) SMT VECTOR MultiChip CVH SuperScalar Simple Pipelined/ Embedded EPIC/VLIW 04163264128-5121K-10K100K-1M>1M TTA Early x86 Parallelism Approximate instruction packet size Dataflow What can be efficiently compiled for today?

28 28 UCB – November 8, 2001 Krishna V Palem Georgia Tech Adaptive EPIC  AEPIC = EPIC processing + Datapath reconfiguration  Reconfiguration through three basic operations: –add a functional unit to the datapath –remove a functional unit from the datapath –execute an operation on resident functional unit RLA F2 F1

29 29 UCB – November 8, 2001 Krishna V Palem Georgia Tech AEPIC Machine Organization Multi-context Reconfigurable Logic Array Array Register File Configuration cache Level-1 configuration cache L1 L2 CRF GPR/FPR/… I-Cache F/D

30 30 UCB – November 8, 2001 Krishna V Palem Georgia Tech AEPIC Compilation

31 31 UCB – November 8, 2001 Krishna V Palem Georgia Tech Key Tasks For AEPIC Compiler  Partitioning –Identifying code sections that convert to custom instructions  Operation synthesis –Efficient CFU implementations for identified partitions  Instruction selection –Multiple CFU choices for code partitions  Resource allocation –Allocation of C-cache and MRLA resources for CFUs  Scheduling –When and where to schedule adaptive component instructions

32 32 UCB – November 8, 2001 Krishna V Palem Georgia Tech Machine description Machine description Configuration Library Configuration Library Performance statistics Source program Simulation Code generation Post-pass scheduling Resource allocation Pre-pass scheduling High-level optimizations (EPIC core) High-level optimizations (EPIC core) Mapping (Adaptive) Mapping (Adaptive) Partitioning Front-end processing Compiler Modules

33 33 UCB – November 8, 2001 Krishna V Palem Georgia Tech RL Reconfigurable logic can accommodate only two configurations simultaneously Processor B G R Overlapping live-range region Record of execution Resource allocation for configurations

34 34 UCB – November 8, 2001 Krishna V Palem Georgia Tech Managing Reconfigurable Resource (contd.)  Simpler version related to register allocation problem  New parameters –No need to save to memory on spill –configurations immutable –Load costs different for different configurations –C-cache, MRLA = multi-level register file  Adapted register allocation techniques to configuration allocation –Non-uniform sizes of configurations: graph multi-coloring –Adapted Chaitin’s coloring based techniques

35 35 UCB – November 8, 2001 Krishna V Palem Georgia Tech The Problem With Long Reconfiguration Times Configuration load time Configuration execution time

36 36 UCB – November 8, 2001 Krishna V Palem Georgia Tech Speculating Configuration Loads Mask reconfiguration times! Need to know when and where to speculate Configuration execution time Speculated configuration loads f1f1 f2f2 If f1>>f2 do not speculate to “red” empty load slot

37 37 UCB – November 8, 2001 Krishna V Palem Georgia Tech Sample Compiler Topics  Configuration cache management  Power/clock rate vs. Performance tradeoff  Bit-width optimizations

38 38 UCB – November 8, 2001 Krishna V Palem Georgia Tech More Generally Architecture Assembly  An ISA view  Synthesis and other hardware design off-line  Much closer to compiler optimizations implies faster compile time Applications Program “Compiler” selects assembles and optimizes program Dynamically variable ISA Architecture implementation Prebuilt Implementations Build off-line (synthesis, place and route) Data pathStorageInterconnect Also applicable to yield fixed implementations in silicon


Download ppt "UCB November 8, 2001 Krishna V Palem Georgia Tech What Is a Compiler When The Architecture Is Not Hard(ware) ? Krishna V Palem This work was supported."

Similar presentations


Ads by Google