Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hot Chips 16August 24, 2004 OptimoDE: Programmable Accelerator Engines Through Retargetable Customization Nathan Clark, Hongtao Zhong, Kevin Fan, Scott.

Similar presentations


Presentation on theme: "Hot Chips 16August 24, 2004 OptimoDE: Programmable Accelerator Engines Through Retargetable Customization Nathan Clark, Hongtao Zhong, Kevin Fan, Scott."— Presentation transcript:

1 Hot Chips 16August 24, 2004 OptimoDE: Programmable Accelerator Engines Through Retargetable Customization Nathan Clark, Hongtao Zhong, Kevin Fan, Scott Mahlke CCCP Research Group University of Michigan http://cccp.eecs.umich.edu Kriszti á n Flautner, Koen Van Nieuwenhove ARM Limited

2 Hot Chips 16August 24, 2004 OptimoDE Overview OptimoDE –A configurable VLIW-styled Data Engine architecture –Targeted at intensive data processing Characteristics –Very wide performance envelope Power / area / speed tradeoff Exploiting parallelism in applications –Unlimited data path configuration options –User extensible through ISA customization Semi-automatic design system –User-in-the-loop design, retargetable compiler toolchain

3 Hot Chips 16August 24, 2004 OptimoDE in a System On Chip SDRAM SoC AHB Bus Matrix ARM CPU DMA Controller Interrupt Controller SRAM I/O SDRAM Controller Memory Control Data Engine DATA 1 MEM CTRL DATA 2 FIFO switch Memory Data M1 M2 M S S S M S SM

4 Hot Chips 16August 24, 2004 OptimoDE Architecture Model Functional Units –ALU, ACU, Multipliers –Custom Memory –RAM ( asynch / synch ) –ROM I/O ports –addressable –handshake protocol Registers –Register files Interconnect –Direct connection –Shared bus Controller All layers required Intra-layer configuration interconnect Interconnect Controller CC interconnect … … Function units Memories regs … I/O ports Registers

5 Hot Chips 16August 24, 2004 Design Toolchain A.tmp DesignDEDEvelop Librarian save ISS set target 001010 110011 010110 110111 load run / profile A.inc A.inc.c 1 User Library instantiate #include … main() { … = dct(); } create_resource xxxxx xx dct xxxxx A.inc A.inc.c 2 A.inc A.inc.c 3  A Evaluation OptimoDE Library OptimoDE Library User Library  A Definition LIFETIME LOAD

6 Hot Chips 16August 24, 2004 Compiler Toolchain * 1 + 2 * 4 * 5 + 6 + 7 * 8 INPUT OUTPUT + 3 * 1 + 2 + 3 012345012345 + 2 * 1 * 4 * 3 + 6 * 8 + 7 * 5 * 9 + 10 C Source Description DEvelop Micro- code check map compile Analysis feedback Syntax checks Dataflow analysis Match architecture and dataflow graph Optimize code and register use

7 Hot Chips 16August 24, 2004 32-point DCT Microarchitecture 2 Custom FUs, 2 RAM, 1 ROM, 3 ACU, 2 I/O ports Designer responsible for creating custom units manually

8 Hot Chips 16August 24, 2004 Retargetable Customization Prototype 2 technologies in OptimoDE –Automated ISA customization –Retargetable customization to an “application-area” Customizing for 1 application –Programmability  Nominally programmable –Critical problem – Cannot sustain performance across similar applications –How well does a custom ISA generalize 5 encryption algorithms, create custom design for each Average loss >80% verses native [MICRO, 2003] –Proactive generalization creates a retargetable design

9 Hot Chips 16August 24, 2004 Creating Custom Instructions Candidate discovery –Identify customization opportunities Examine program DFG Partition DFG at: –Memory operations –Unprofitable edges Enumerate candidate subgraphs within each partition

10 Hot Chips 16August 24, 2004 Grouping and Selection Group candidate subgraphs with same structure Group 4 Group 2 Group 1 Group 3 Group 4 Group 3 Group 1Group 2 Estimate performance and cost for each group Cost: 0.5 Adders Gain: 1,000 Cycles Cost: 1 Adder Gain: 2,500 Cycles Cost: 2 Adders Gain: 10,000 Cycles Cost: 1 Adder Gain: 1,500 Cycles Greedily select groups to implement in hardware subject to budget

11 Hot Chips 16August 24, 2004 Wildcard – multiple functionality at nodes Input 2 Input 1 0xFF 0x4 0x8, 0x4 >> |,& +,- Output Proactively Generalize Groups Cost-effectively extend group functionality to enable reuse Input 2 Input 1 0xFF 0x8 >> | + Output Input 2 Input 1 0xFF 0x4 0x8, 0x4 >> |,& +,- Output Subsumed – configurable interconnect to bypass nodes

12 Hot Chips 16August 24, 2004 Native Speedups

13 Hot Chips 16August 24, 2004 Importance of Generalization Key: application run – application designed for

14 Hot Chips 16August 24, 2004 Designing for a Domain

15 Hot Chips 16August 24, 2004 Case Study - Md5

16 Hot Chips 16August 24, 2004 OptimoDE Design for this Point Input 4 Input 1 Input 3 Input 2 + + + << 0x5 >> 0x1B | Output ^ & ^ Input 1Input 2 Input 3 Output ALU 1ALU 2CFUSRAMACU RF … Control Memory

17 Hot Chips 16August 24, 2004 Die Area Breakdown ALU 1ALU 2CFUSRAMACU RF … Control Memory Die impact is artificially large because of naïve implementation OptimoDE = 5.5 mm 2 in 0.13  ARM 926EJ = 5.0 mm 2 in 0.13 

18 Hot Chips 16August 24, 2004 Conclusions OptimoDE –Configurable VLIW-style data engine architecture –Automated tools for implementing embedded signal and data processing solutions Automatic retargetable customization –Customized design combined with cost-effective generalization –Performance programmability - Performance stability across a family of similar applications


Download ppt "Hot Chips 16August 24, 2004 OptimoDE: Programmable Accelerator Engines Through Retargetable Customization Nathan Clark, Hongtao Zhong, Kevin Fan, Scott."

Similar presentations


Ads by Google