Presentation is loading. Please wait.

Presentation is loading. Please wait.

Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002 Prof. Jarmo Takala Institute of Digital and Computer.

Similar presentations


Presentation on theme: "Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002 Prof. Jarmo Takala Institute of Digital and Computer."— Presentation transcript:

1 Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002 Prof. Jarmo Takala Institute of Digital and Computer Systems Tampere University of Technology Tampere, Finland Tel: +358 – 33115 3879; Email: jarmo.takala@tut.fi

2 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Outline  Motivation  Transport Triggered Architecture (TTA)  Design Methodology for TTAs  Research at TUT  Conclusions

3 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Motivation  Programmable processors often used in products using digital signal processing (DSP)  Flexibility  Ease of verification  Traditionally DSP processor architectures have been developed based on average performance in several benchmark tasks (~100)  User applications often contain only subset of total benchmarks  Efficiency can be improved by customizing architecture according to given tasks

4 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Motivation  DSP applications are often hard realtime constrained  execution should be deterministic  dynamic runtime behaviours should be avoided  Static scheduling lends itself to DSP  Current design complexities call for increase in designer productivity  High level languages should be used  DSP algorithms contain inherent parallelism  Instruction level parallelism (ILP) should be maximized

5 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 What is needed?  Application driven design process with easy design space exploration  Replace hardware complexity by software complexity  Compiler driven process  Use templated architecture  Flexible heterogeneous function units  Modular scalability  Orthogonal compiler friendly

6 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Choices for Architecture Template Frontend Application sequential (superscalar) dependence (dataflow) independence (EPIC) independence (VLIW) Compilation time (Software) Determine Dependencies Determine Independencies Bind Function Units Determine Dependencies Determine Independencies Bind Function Units Bind Datapaths & Execute Run time (Hardware) ILP Architectures

7 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 VLIW Gained Popularity in DSP Register File Instruction FetchInstruction Decode Data MemoryInstruction Memory Bypassing Network CPU FU-1 FU-2 FU-3 FU-4 FU-5

8 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Transport Triggered Architecture  VLIW drawbacks  Bypass complexity  Register file complexity  Register file design restricts FU flexibility  Operation encoding format restricts FU flexibility  Reverse programming paradigm [H. Corporaal, 94]  data transport  operation  Instruction set contains only a single instruction: move

9 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 From VLIW to TTA Register File Bypassing Network VLIW Instruction Fetch Instruction Decode Instruction Memory FU-1 FU-2 FU-3 FU-4 FU-5 Data Memory Instruction Fetch Instruction Decode Bypassing Network FU-1 FU-2 FU-3 FU-4 FU-5 Register File TTA

10 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 TTA Datapath Integer ALU Float ALU Boolean RF Float RF Integer RF Socket Instruction Memory Data Memory Load/Store Unit Immediate Unit Instruction Unit

11 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Function Units  Operands written to operand registers (O)  Operation performed when last operand written to trigger register (T)  Pipeline synchronized with control bits (C)  Standard interface  FU_ready  Result_ready  Global_lock T optional Optional shadow register O logic R C C C C

12 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 ILP Architectures Frontend Application sequential (superscalar) dependence (dataflow) independence (EPIC) independence (VLIW) Compilation time independence (TTA) Determine Dependencies Determine Independencies Bind Function Units Bind Datapaths Execute Determine Dependencies Determine Independencies Bind Function Units Bind Datapaths Run time

13 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 TTA Characteristics: HW  Modular  Can be constructed with standard building blocks  Very flexible and scalable  FU functionality can be arbitrary  Supports user defined Special Function Units (SFU)  Lower complexity  Reduction on # register ports  Reduced bypass complexity  Reduction in bypass connectivity  Reduced register pressure  Trivial decoding (implies long instructions)

14 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 TTA Characteristics: SW  Traditional operation-triggered instruction:  Transport-triggered instruction:  Reminds dataflow and time-stationary coding mul r1,r2,r3; r1  mul.o; r2  mul.t; mul.r  r3; r1  mul.o, r2  mul.t; mul.r  r3; or

15 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 TTA Design Tools  Design tools based on TTA architecture template have been developed at Delft University of Technology (DUT), Delft, the Netherlands  MOVE project lead by Prof. Henk Corporaal  Fully parametric C/C++ Compiler buses, connections, function units, register files, etc.  Design space explorer  Processor generator

16 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Sequential Simulator Code Generation Trajectory I/O Parallel Code GCC or SUIF Profiling Data Parallel Simulator Compiler Backend Sequential Code Application (C/C++) Architecture Description Compiler Frontend I/O (MOVE Project at DUT)

17 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 TTA Specific Optimizations  TTA allows extra scheduling optimizations  E.g., software bypassing  Bypassing can eliminate the need of RF access  However, more difficult to schedule ! Example:r1 → add.o, r2 → add.t; add.r → r3; r3 → sub.o, r4 → sub.t sub.r → r5; Translates to:r1 → add.o, r2 → add.t; add.r → sub.o, r4 → sub.t; sub.r → r5;

18 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Resource Optimization Connectivity Optimization Design Space Exploration Application (C/C++) Map&Schedule Frontend FU models Cost Functions Simulator Resources (Mach) Map&Schedule Design Point Simulator Design Points Select Resources Reduce Connections (MOVE Project at DUT)

19 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Exploration: Resourse Optimization Pareto curve represents the lowest bound of found architecture configurations Selected architecture for further optimization (MOVE Project at DUT)

20 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Exploration: Connectivity Optimization (MOVE Project at DUT) Reduced connections decrease bus delay Critical connections have been removed IRU ALU IU LSU IU LSU IU LSU

21 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Topics to be Investigated  Poor code density  good target for code compression techniques apriori information of application, thus instruction propabilities known  Estimations  Power estimation  Fast estimations with sufficient accuracy  Flexibity, reuse  Applications may change, thus additional resources need to assigned although not needed by the original application  Tool-assisted special function unit generation  Analysis support  Model creation support  Characterization support  Parameterized processor generator  Interconnections, control, etc. maybe realized in several ways depending on the target  Low-power optimizations  Clustered TTAs  Interprocessor communication schemes  These topics considered in FlexDSP Project at TUT

22 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Code Compression New Design Environment Functionality (C/C++) Operation Analysis Parametric Compiler Parametric Processor Generator Parallel Object Code HDL Code Frontend Design Space Exploration FU models (C, HDL) Cost Functions (area, power, speed) Resource Constraints TTA Processor SFU Generation Target of FlexDSP Project at TUT

23 J.Takala/TUTBerkeley – Finland Day, Oct.18, 2002 Conclusions  Design methodologies allowing processor customization will improve efficiency in certain application areas, e.g., multimedia, telecom  TTA is a promising candidate for architectural template for customized processors  In particular, support for custom function units allows powerful tailoring  Results of MOVE project at DUT have already proven the concept  Parameterized compiler allows tool-assisted design space exploration  Still more research needed on  Hardware implementations  Enhanced compiler strategies


Download ppt "Design Methodology for Customizable Programmable Processors Berkeley – Finland Day, Oct. 18, 2002 Prof. Jarmo Takala Institute of Digital and Computer."

Similar presentations


Ads by Google