CS294-6 Reconfigurable Computing Day 13 October 6, 1998 Interconnect Partitioning.

Slides:

Advertisements

Similar presentations

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Advertisements

Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: March 4, 2008 Placement (Intro, Constructive)

Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 27, 2008 Partitioning 2 (spectral, network flow, replication)

VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 20, 2008 Partitioning (Intro, KLFM)

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.

CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.

VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,

EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.

DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.

CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.

EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.

Reconfigurable Computing (EN2911X, Fall07)

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.

Partitioning 2 Outline Goal Fiduccia-Mattheyses Algorithm Approach

EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.

CS267 L15 Graph Partitioning II.1 Demmel Sp 1999 CS 267 Applications of Parallel Computers Lecture 15: Graph Partitioning - II James Demmel

CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.

EDA (CS286.5b) Day 7 Placement (Simulated Annealing) Assignment #1 due Friday.

Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.

Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2009 Partitioning 2 (spectral, network flow)

EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example.

1 Enhancing Performance of Iterative Heuristics for VLSI Netlist Partitioning Dr. Sadiq M. Sait Dr. Aiman El-Maleh Mr. Raslan Al Abaji. Computer Engineering.

Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.

Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.

CSE 373 Data Structures Lecture 15

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 4: January 28, 2015 Partitioning (Intro, KLFM)

CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.

CSE 494: Electronic Design Automation Lecture 4 Partitioning.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 2: January 21, 2015 Heterogeneous Multicontext Computing Array Work Preclass.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 14, 2011 Placement (Intro, Constructive)

CS 584. Load Balancing Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)

Penn ESE525 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: February 4, 2014 Partitioning 2 (spectral, network flow)

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 3: January 12, 2004 Clustering (LUT Mapping, Delay)

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)

CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 10: February 11, 2002 Partitioning 2 (spectral, network flow, replication)

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: January 30, 2013 Partitioning (Intro, KLFM)

CSE 144 Project. Overall Goal of the Project Implement a physical design tool for a two- row standard cell design

Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 11: February 25, 2015 Placement (Intro, Constructive)

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 17: March 29, 2010 Interconnect 2: Wiring Requirements and Implications.

CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.

Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day12: November 1, 2000 Interconnect Requirements and Richness.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.

CS137: Electronic Design Automation

CS137: Electronic Design Automation

CS137: Electronic Design Automation

CS137: Electronic Design Automation

ESE534: Computer Organization

ESE534: Computer Organization

ESE535: Electronic Design Automation

CS184a: Computer Architecture (Structures and Organization)

ESE535: Electronic Design Automation

CS137: Electronic Design Automation

Reconfigurable Computing (EN2911X, Fall07)

Presentation transcript:

CS294-6 Reconfigurable Computing Day 13 October 6, 1998 Interconnect Partitioning

Previously Established –cannot afford fully general interconnect –must exploit locality in network Quantified –locality (geometric growth / Rent’s Rule)

Today Automation to extract/exploit locality –Partitioning Heuristics FM Spectral

Partitioning Heirarchical placement Often used to initialize placement NP Hard in general Fast heuristics exist Don’t address critical path delay

Partitioning Problem Given: netlist of interconnect cells Partition into two (roughly) equal halves (A,B) minimize the number of nets shared by halves “Roughly Equal” –balance condition: (0.5-  )N  |A|  (0.5+  )N

Fiduccia-Mattheyses (Kernighan-Lin refinement) Randomly partition into two halves Repeat until no updates –Start with all cells free –Repeat until no cells free Move cell with largest gain Update costs of neighbors Lock cell in place (record current cost) –Pick least cost point in previous sequence and use as next starting position Repeat for different random starting points

FM Cell Gains Gain = Delta in number of nets crossing between partitions

FM Recompute Cell Gain For each net, keep track of number of cells in each partition Move update:(for each net on moved cell) –if T(net)==0, increment gain on F side of net (think -1 => 0) –if T(net)==1, decrement gain on T side of net (think 1=>0) –decrement F(net), increment T(net) –if F(net)==1, increment gain on F cell –if F(net)==0, decrement gain on all cells (T)

FM Recompute (example)

FM Recmopute (example)

FM Data Structures Partition Counts A,B Two gain arrays –Key: constant time cell update Cells –successors (consumers) –inputs –locked status

FM Optimization Sequence (ex)

FM Running Time? Randomly partition into two halves Repeat until no updates –Start with all cells free –Repeat until no cells free Move cell with largest gain Update costs of neighbors Lock cell in place (record current cost) –Pick least cost point in previous sequence and use as next starting position Repeat for different random starting points

FM Running Time Claim: small number of passes (constant?) to converge Small (constant?) number of random starts N cell updates Updates K + fanout work (avg. fanout K) –assume K-LUTs Maintain ordered list O(1) per move –every io move up/down by 1 Running time: O(KN)

FM Starts? 21K random starts, 3K network -- Alpert/Kahng

Tool Admin “breather” ppw autoretime limitdepth gbufplace ar (sliding) my_adder Makefile ex

Spectral Partitioning Minimize Squared Wire length -- 1D layout Start with connection array C (c i,j ) “Placement” Vector X for x i placement cost = 0.5*  (all I,j) (x i - x j ) 2 c i,j cost sum is X’BX –B = D-C –D=diagonal matrix, d i,i =  (over j) c i,j

Spectral Partition Constraint: X’X=1 –prevent trivial solution all x i ’s same Minimize cost=X’BX w/ constraint –minimize L=X’BX- (X’X-1) –  L/  X=2BX-2 X=0 –(B- I)X=0 –X => Eigenvector of B –cost is Eigenvalue

Spectral Partitioning X (x i ’s) continuous use to order nodes cut partition from order

Spectral Ordering Midpoint bisect isn’t necessarily best place to cut, consider: K (n/4) K (n/2)

Spectral Partitioning Options Can bisect by choosing midpoint Can relax cut critera –min cut w/in some  of balance Ratio Cut –minimize (cut/|A||B|) idea tradeoff imbalance for smaller cut –more imbalance =>smaller |A||B| –so cut must be much smaller to accept

Spectral vs. FM From Hauck/Boriello ‘96

Improving Spectral More Eigenvalues –look at clusters in n-d space –5--70% improvement over EIG1

Spectral Theory There are conditions under which spectral is optimal –Boppana Provides lower/upper bound on cut size

Improving FM Clustering technology mapping initial partitions runs partition size freedom replication Following comparisons from Hauck and Boriello ‘96

Clustering Group together several leaf cells into cluster Run partition on clusters Uncluster (keep partitions) –iteratively Run partition again –using prior result as starting point

Clustering Benefits Catch local connectivity which FM might miss –moving one element at a time, hard to see move whole connected groups across partition Faster (smaller N) –METIS -- fastest research partitioners exploits heavily –…works for spectral, too FM work better w/ larger nodes (???)

How Cluster? Random –cheap, some benefits for speed Greedy “connectivity” –examine in random order –cluster to most highly connected –30% better cut, 16% faster than random Spectral –look for clusters in placement –(ratio-cut like) Brute-force connectivity (can be O(N 2 ))

LUT Mapped? Better to partition before LUT mapping.

Initial Partitions? Random Pick Random node for one side –start imbalanced –run FM from there Pick random node and Breadth-first search to fill one half Pick random node and Depth-first search to fill half Start with Spectral partition

Initial Partitions If run several times –pure random tends to win out –more freedom / variety of starts –more variation from run to run –others trapped in local minima

Number of Runs

2 - 10% % 20 <20% (2% better than 10) 50 (4% better than 10) …but?

FM Starts? 21K random starts, 3K network -- Alpert/Kahng

Replication Trade some additional logic area for smaller cut size Replication data from: Enos, Hauck, Sarrafzadeh ‘97

Replication 5% => 38% cut size reduction 50% => 50+% cut size reduction

Partitioning Summary Two effective heuristics –Spectral –FM many ways to tweak –Hauck/Boriello half size of vanilla even better with replication only address cut size, not critical path delay

Next Lecture LIVE (Taping): Wednesday 4pm (here) Playback: Thursday, classtime/place Programmable Computing Elements –Computing w/ Memories