# Selective Flexibility: Breaking the Rigidity of Datapath Merging Mirjana Stojilović, Institute Mihailo Pupin, University of Belgrade David Novo, École.

## Presentation on theme: "Selective Flexibility: Breaking the Rigidity of Datapath Merging Mirjana Stojilović, Institute Mihailo Pupin, University of Belgrade David Novo, École."— Presentation transcript:

Selective Flexibility: Breaking the Rigidity of Datapath Merging Mirjana Stojilović, Institute Mihailo Pupin, University of Belgrade David Novo, École Polytechnique Fédérale de Lausanne (EPFL) Lazar Saranovac, School of Electrical Engineering, University of Belgrade Philip Brisk, University of California Riverside Paolo Ienne, École Polytechnique Fédérale de Lausanne (EPFL)

Zuluaga and Topham, TCAD 2009 The Rigidity of Datapath Merging 2/34 Brisk, Kaplan, and Sarrafzadeh, DAC 2004 Datapath merging is a technique for generating a single reconfigurable datapath out of a set of input DFGs, which focuses on resource reuse among DFGs to save area.

The Rigidity of Datapath Merging 3/34

Motivation Improve the efficiency through specialization Area savings by merging datapaths But what about flexibility? We want to fill this gap! 4/34

Selective Flexibility flexibility = ability to capture and implement computational structures that are characteristic of a specific application domain selective = the computational structures are characterized, and thus restricted, by: (1) type of operations, (2) their number, and (3) their interconnections 5/34

Path Fusion Creating a SUPERPATH – the (minimum area) super-sequence of all sequences of operators found in input DFGs 6/34

Path Fusion STEP 1: Enumerate all paths from inputs to the outputs of each DFG. A path in a graph is a sequence of vertices such that from each of its vertices there is an edge to the next vertex in the sequence. 7/34

Path Fusion STEP 2: Group the paths into sets based on their length 8/34

A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements. Path Fusion STEP 3: Perform greedy search for maximum-area common subsequence (MACSeq) STEP 4: Fuse the pair of paths (sequence alignment by Needleman/Wunsch) REPEAT steps 3-4 UNTIL a single path is left in the set Assumption: MUL > SUB > ADD 9/34 Brisk, Kaplan, and Sarrafzadeh, DAC 2004

Path Fusion STEP 5: Proceed by moving the path to the set with shorter paths REPEAT steps 3-5 UNTIL a single path is left – THE SUPERPATH THE SUPERPATH 10/34

Array Generation Superpath replication to create regular array of operators. How many columns? 11/34

Interconnect Dimensioning Adding FPGA-like interconnections: two I/O ports per column, horizontal and vertical channels 12/34

Interconnect Dimensioning To decide on the number of word-size tracks we do P&R Placement: Assign nodes to rows (top-down) When assigning to columns: keep distances between nodes short emphasize graph regularity emphasize symmetry dot, tool for laying out hierarchical drawings of directed graphs Yoon, Shrivastava, Park, Ahn, Jeyapaul, and Paek, ASP-DAC 2008 Cong and Jiang, FPGA 2008 13/34

Interconnect Dimensioning DFG to be placed (visualization by dotty) 14/34

Interconnect Dimensioning SUPERPATH Top-down greedy placement approach: Place the node in the first row with the correct operators below predecessor nodes. 15/34

Interconnect Dimensioning Placement exception: if a node is a part of a binary tree, first minimize the tree height and thenplace as early as possible. Rows never used are potentially removed after placement to conserve area! 16/34

Interconnect Dimensioning dot is forced to place nodes having the same rank within the same row. dot outputs: Vertical coordinates of nodes Horizontal coordinates of nodes 17/34

Interconnect Dimensioning Horizontal coordinate adjustment – rounding, scaling 18/34

Interconnect Dimensioning To decide on the number of word-size tracks we do P&R Placement defined by dot VPR, an FPGA architectural simulator and tool for P&R FPGA-like routing: horizontal and vertical routing channels two-IN one-OUT operators two IN/OUT ports per column word-size tracks (constant bitwidth) Betz and Rose, FPGA 2000 Ye and Rose, Transactions on VLSI systems, 2006 19/34

Interconnect Dimensioning Inputs for VPR: DFG Netlist DFG Placement (dot) Architectural description VPR does the routing and reports MIN channel width to achieve legal routing 20/34

Recap Path fusion Array generation 21/34

Recap 22/29 Placement Routing MIN channel width = 4

Recap 23/29 Routing Placement MIN channel width = 4

Recap 24/29 Routing Placement MIN channel width = 6

Recap CW 1 = 4CW 2 = 4CW 3 = 6 FINAL number of rows = 12 FINAL CW = MAX{CW 1, CW 2, CW 3 } = 6 25/34

Experimental Results 1.Measuring area/delay with respect to ASIC and FPGA 2.Measuring generality Where do our domain-specific datapaths fit? 26/34

Experimental Results 19 DFGs covering various classical signal and image processing computations (FFT, FFTr4, DCT, IDCT, FIR, IIR, autocorrelation, sobel, complex dot product, …) DFGs extracted from applications available in EEMBC, TMS320C64x DSP library, TMS320C64x Image/Video processing library, and ExpressDFG Loop unrolling with different factors Groupings: GP1 contains all DFGs, while GP2x, GP3x and GP4x regroup DFGs into different and increasingly smaller clusters 27/34

GENERALITY – the ratio of the number of successfully mapped excluded DFGs to the total number of DFGs in the group Generality 28/34

Generality Group# of DFGsGenerality [%] 11987.5 2A875.0 2B1172.7 3A1090.0 3B887.5 4A683.3 4B450.0 4C475.0 4D560.0 Generality 50-90% In most cases higher than 75% Lower generality when the learning set is small (4B, 4D) No extra columns in the array to potentially accommodate bigger DFGs 29/34

Area/Delay compared to ASIC and FPGA We synthesized, placed, and routed individually all the operations found in the DFG using the gate implementations of a commercial 65nm library Conservatively, we ignored the routing area and delay in the ASIC implementation VPR estimates the routing area in the datapath and the routing delay of a DFG when P&R on the datapath Conservatively, VPR considers all wires as individual wires, not busses 30/34

Area/Delay compared to ASIC and FPGA Kuon and Rose, “Measuring the gap between FPGAs and ASICs”, FPGA 2006 outliers 31/34 Conservatively, ASIC area/delay refers to a single DFG, rather than all DFGs merged In most cases, delay cost < 2 and area cost < 10-12

Area/Delay compared to ASIC and FPGA 32/34

Conclusions A novel way to merge DFGs  application domain specific CGRA A new tradeoff between generality and efficiency Future directions: Specialize the bitwidth of the operators Customize the shape of the datapath to better fit the domain 33/34

Thank you. Mirjana Stojilović, mirjana.stojilovic@pupin.rs David Novo, david.novobruna@epfl.ch Lazar Saranovac, laza@el.etf.rs Philip Brisk, philip@cs.ucr.edu Paolo Ienne, paolo.ienne@epfl.ch

Download ppt "Selective Flexibility: Breaking the Rigidity of Datapath Merging Mirjana Stojilović, Institute Mihailo Pupin, University of Belgrade David Novo, École."

Similar presentations