Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.

Similar presentations


Presentation on theme: "A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems."— Presentation transcript:

1 A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems University of California, Irvine

2 Copyright  2006, CECS 2 Outline Introduction Design Methodology Initial Allocation Architecture Wizard Critical Path Extraction Spill Algorithm Results Conclusion

3 Copyright  2006, CECS 3 Introduction (1of 2) Complexity of SoC rising Short time to market Need for processors specialized for different application domains General purpose processors Often slow and power hungry Full HW design Expensive and rigid for debugging and feature extension Custom processor Adapt the data path to a given application  Need for automatic generation of application specific architectures

4 Copyright  2006, CECS 4 Introduction (2 of 2) Previous work in High Level Synthesis Integer linear programming [Landwehr et al.] Force driven scheduling [Paulin and Knight] Finding minimal cliques [Tseng and Seiwiorek] Branch-and-bound [Marwedel] Proposed methodology separates the allocation from scheduling and binding

5 Copyright  2006, CECS 5 Design Methodology Define application’s maximum requirements ALAP schedule Initial Allocation chooses from Component DB (CDB) Select as many units as needed for ALAP Architecture Wizard (AW) analyzes component utilization Based on the schedule and profiling data Optimized Architecture Using the design constraints

6 Copyright  2006, CECS 6 Initial Allocation and Component Selection Define max requirement Based on the statistics for operators and data transfer Finding “the best fit” in CDB for given requirements Storage (RF and Memory) Min difference in number of ports Functional units: The most general unit executing given operation Buses: Source buses: –N, if N is even –(N+1), if N is odd –Where N = # RF output ports Destination buses = #RF in ports

7 Copyright  2006, CECS 7 Architecture Wizard - Overview Goal of Phase II Reducing number of used resources Under performance and utilization constraints Inputs: Schedule for the Max Configuration Execution frequencies (Profiler) Utilization and performance constraints (Designer) Component Data Base (CDB) Outputs: Architecture Net-List Report

8 Copyright  2006, CECS 8 Architecture Wizard: Tool Flow Histograms for A functional unit type Group of in/out ports of a storage unit For the basic blocks (BB) in the critical path, for each histogram Vary number of units Estimate execution and utilization Allocate data path when constraints satisfied Use the same heuristics as for the initial allocation

9 Copyright  2006, CECS 9 Critical Path Extraction Critical Path: A sequence of BB from start to end that contributes the most to the execution time 1.Start with the graph of the application 2.Create direct acyclic graph 3.Create dual graph  edge ex, create a node Ex  node By, create (input X output) # of edges 4.Transform to the shortest path problem Compute weights as 1/w i or W max -w i 5.Find the shortest path 1 2 3

10 Copyright  2006, CECS 10 “Spill” - Flattening Algorithm Utilization profile for each FU type and in/out port of storage unit Type and number of instances of other components is unchanged For chosen number of FUs Estimate extra cycles (Δ) by postponing operations into empty slots Maximize component utilization Utilization = ΣUsed FUs / (choden# * Exec. Time) Compute global Δ and utilization Per block estimation Execution frequencies FU in use in current cycle Estimated use of FU Available FU not in use

11 Copyright  2006, CECS 11 Results Application: bdist2 (MPEG2 encoder), OnesCounter, Sort (bubble sort), dct32 (MP3) Δ= 20%, Utilization = 75% Bench FUsBusesTri-State Δ [%] Avg. Iter. T [s] MCR R R bdist26465401932.12.80.05 Ones Counter 3265341711.91.40.05 Sort436536180.62.90.06 dct32646540191.4 0.48

12 Copyright  2006, CECS 12 Conclusion Automatic generation of data path Separate allocation from scheduling and binding Initial Allocation – creates dense architecture Architecture Wizard – refines architecture for given constraints Future work and issues Reduce area –Reduce complexity of FU –Further reduce interconnect Features –Pipelining, chaining, forwarding, special function units

13 Copyright  2006, CECS 13 Thank You!


Download ppt "A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems."

Similar presentations


Ads by Google