Presentation is loading. Please wait.

Presentation is loading. Please wait.

CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with.

Similar presentations


Presentation on theme: "CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with."— Presentation transcript:

1 CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with Eylon Caspi]

2 CALTECH CS137 Spring2002 -- DeHon Today Cover/clustering –Minimize Weight –W/ area and IO constraints Motivation: SCORE Page generation –Also energy minimization Techniques Current Results FPGA/hardware implementation?

3 CALTECH CS137 Spring2002 -- DeHon Abstract Problem Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. Cluster nodes into subsets V i, such that   (Cost(V i )) minimized  IO(V i ) < IO limit  A(V i ) < Area limit  Cost(V i ) =  (cost(e) | e  E st. e 1  V i and e 2  V i )

4 CALTECH CS137 Spring2002 -- DeHon SCORE Compilation Programming ModelExecution Model Graph of TDF FSMD operators Graph of page configs - unlimited size, # IOs- fixed size, # IOs - no timing constraints- timed, single-cycle firing Compile memory segment TDF operator stream memory segment compute page stream

5 CALTECH CS137 Spring2002 -- DeHon How Big is an Operator? Wavelet Decode Wavelet Encode JPEG Encode MPEG Encode JPEG Encode JPEG Decode MPEG (I) MPEG (P) Wavelet Encode IIR

6 CALTECH CS137 Spring2002 -- DeHon Clustering is Critical Inter-page comm. latency may be long Inter-page feedback loops are slow Cluster to: –Fit feedback loops within page –Fit feedback loops on device

7 CALTECH CS137 Spring2002 -- DeHon Pipeline Extraction Hoist uncontrolled FF data-flow out of FSMD Benefits: –Shrink FSM cyclic core –Extracted pipeline has more freedom for scheduling and partitioning Extract state foo(i): acc=acc+2*i state foo(two_i): acc=acc+two_i i state DF CF *2 two_i i pipeline

8 CALTECH CS137 Spring2002 -- DeHon Pipeline Extraction – Extractable Area JPEG Encode JPEG Decode MPEG (I) MPEG (P) Wavelet Encode IIR

9 CALTECH CS137 Spring2002 -- DeHon Page Generation Pipeline extraction –removes dataflow can freely extract from FSMD control Still have to partition potentially large FSMs –approach: turn into a clustering problem

10 CALTECH CS137 Spring2002 -- DeHon State Clustering Start: consider each state to be a unit Cluster states into page-size sub- FSMDs –Inter-page transitions become streams Possible clustering goals: –Minimize delay (inter-page latency) –Minimize IO(inter-page BW) –Minimize area (fragmentation) IAIA IBIB OAOA OBOB

11 CALTECH CS137 Spring2002 -- DeHon State Clustering to Minimize Inter-Page State Transfer Inter-page state transfer is slow Cluster to: –Contain feedback loops –Minimize frequency of inter-page state transfer Previously used in : –VLIW trace scheduling [Fisher ‘81] –FSM decomposition for low power [Benini/DeMicheli ISCAS ‘98] –VM/cache code placement –GarpCC code selection [Callahan ‘00]

12 CALTECH CS137 Spring2002 -- DeHon Clustering Problem SCORE Page –Fixed area (# of LUTs) –Fixed IO Cost on edges is probability take state transition Clustering Goal is to minimize page-to-page transition –Maximize expected transitions within same page –Find page-count/page-transition tradeoff curve

13 CALTECH CS137 Spring2002 -- DeHon Abstract Problem Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. Cluster nodes into subsets V i, such that   (Cost(V i )) minimized  IO(V i ) < IO limit  A(V i ) < Area limit  Cost(V i ) =  (cost(e) | e  E st. e 1  V i and e 2  V i ) Pages Inter-Page Communication Frequency

14 CALTECH CS137 Spring2002 -- DeHon DSM Possibly relevant for minimizing delay in DSM Previously discussed: –Larger area  longer wires, slower –Want to cluster logic locally Maybe: –Cluster common computations together –Make distant computation transfer uncommon

15 CALTECH CS137 Spring2002 -- DeHon Island Packing for Energy Note: Modern FPGAs pack cluster of LUTs into an endpoint –e.g. Altera LAB

16 CALTECH CS137 Spring2002 -- DeHon Island Packing for Energy Modern FPGAs pack cluster of LUTs into an endpoint –e.g. Altera LAB Local wiring less energy cost than long wiring Covering for energy: –minimize exposed activity factor –same covering problem

17 CALTECH CS137 Spring2002 -- DeHon Abstract Problem Given: Graph (V,E) with a single weight (area) on each node and two weights (IO, cost) on the edges. Cluster nodes into subsets V i, such that   (Cost(V i )) minimized  IO(V i ) < IO limit  A(V i ) < Area limit  Cost(V i ) =  (cost(e) | e  E st. e 1  V i and e 2  V i ) Clusters/Islands Switching Activity

18 CALTECH CS137 Spring2002 -- DeHon First Try Use FBB (flow cut) [Wong/cs137a:day7] Pick seed element Compute mincut –On mix of IO, cost edge weights? If too small, –Cluster in node and repeat Else –Cluster out node and repeat

19 CALTECH CS137 Spring2002 -- DeHon Mincut lessons Couldn’t consistently control IO –Non-monotonic results adjusting weight Not clear what to cluster in

20 CALTECH CS137 Spring2002 -- DeHon Idea #2 If we had an ordering of nodes –(wishful thinking) Then easy to know how to include more –Just pick the next node Order: 1D list of nodes Cluster: a contiguous sequence of nodes in list –Specify start, finish

21 CALTECH CS137 Spring2002 -- DeHon From Sequence to Clusters Easy to know if a contiguous subsequence –Meets area constraints –Meets io constraints Cover –Set of (non-overlapping) subsequences –Include all nodes

22 CALTECH CS137 Spring2002 -- DeHon Feasible Clusters (mult16a)

23 CALTECH CS137 Spring2002 -- DeHon Covering Not clear when to put more or less stuff in a cluster…versus leave with next cluster –Can’t build clusters greedily Like associative/parthesization problem saw earlier [day 5]

24 CALTECH CS137 Spring2002 -- DeHon Parenthesis Matching Similar But compute from all breaks across a diagonal –Not just nearest neighbor Hence extra O(N) Day 5

25 CALTECH CS137 Spring2002 -- DeHon Dynamic Programming For each subsequence start,end –Either the area and io match –OR want to find a breakpoint between cluster sets Cluster sets start  midpoint, midpoint  end may each either be single or multiple clusters Different splits may –Minimize number of clusters –Minimize cost –Keep dominator set [day11]

26 CALTECH CS137 Spring2002 -- DeHon Algorithm Compute Linear Order Compute IO, Area on each subsequence –Think NxN table (but sparse) Use Dynamic Programming to cover

27 CALTECH CS137 Spring2002 -- DeHon Compute Order? Could experiment with various techniques Considering: Spectral Ordering –[Hall/cs137a:day7] How weight edges? –IO, cost, mix? –Try linear mix…vary mix weighting

28 CALTECH CS137 Spring2002 -- DeHon Weight Mix Why unclear? –IO weight  good to cluster connectivity If Ios limited, allows to use fewer clusters Pack more stuff into page  less cases need to transition –Cost weight  what we’re minimizing Cluster high cost edges together Hide in page –But, cost ordering may get less stuff in page if poorly IO clustered…

29 CALTECH CS137 Spring2002 -- DeHon spp results [see HTML]

30 CALTECH CS137 Spring2002 -- DeHon Versus Weighting (w by 0.01)

31 CALTECH CS137 Spring2002 -- DeHon Discussion Promising Results –New capability not clear what compare to Maybe LUT clustering to validate algorithm –Absolutes look promising Weighting –Not clear how to search for best –Maybe should try other ways of weighting? [Michael suggests try taking log(trans)]

32 CALTECH CS137 Spring2002 -- DeHon Spatial/Hdw Implementation? Compute Linear Order –Use 1D FDSA? Compute IO, Area on each subsequence –Parallel prefix sum scan One for each start point? Use Dynamic Programming to cover –Like parenthesis –Maybe 1D and combine with area/io scan?

33 CALTECH CS137 Spring2002 -- DeHon Promising Ideas Compute good ordering –Easy to vary inclusion when know what’s next to include/exclude Mix weights Cluster to minimize exposed (cut) costs


Download ppt "CALTECH CS137 Spring2002 -- DeHon CS137: Electronic Design Automation Day 13: May 20, 2002 Page Generation (Area and IO Constraints) [working problem with."

Similar presentations


Ads by Google