COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.

Slides:



Advertisements
Similar presentations
NP-Hard Nattee Niparnan.
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
Chapter 4 Retiming.
COE 561 Digital System Design & Synthesis Scheduling Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
The Theory of NP-Completeness
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Train DEPOT PROBLEM USING PERMUTATION GRAPHS
FUNDAMENTAL PROBLEMS AND ALGORITHMS Graph Theory and Combinational © Giovanni De Micheli Stanford University.
Rajat K. Pal. Chapter 3 Emran Chowdhury # P Presented by.
CSC5160 Topics in Algorithms Tutorial 2 Introduction to NP-Complete Problems Feb Jerry Le
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
ECE Synthesis & Verification - Lecture 3 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
ICS 252 Introduction to Computer Design Fall 2006 Eli Bozorgzadeh Computer Science Department-UCI.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
Logic and Computer Design Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS ( ) School of Computing, UNF.
Graph Theory Ch.5. Coloring of Graphs 1 Chapter 5 Coloring of Graphs.
Fall 2006EE VLSI Design Automation I VII-1 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part VII: High Level Synthesis.
1 IOE/MFG 543 Chapter 7: Job shops Sections 7.1 and 7.2 (skip section 7.3)
Combinational Logic Design
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Circle Graph and Circular Arc Graph Recognition. 2/41 Outlines Circle Graph Recognition Circular-Arc Graph Recognition.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Chapter 6-2 Multiplier Multiplier Next Lecture Divider
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
TECH Computer Science NP-Complete Problems Problems  Abstract Problems  Decision Problem, Optimal value, Optimal solution  Encodings  //Data Structure.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
Chap 7. Register Transfers and Datapaths. 7.1 Datapaths and Operations Two types of modules of digital systems –Datapath perform data-processing operations.
Memory Allocation of Multi programming using Permutation Graph By Bhavani Duggineni.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Data Structures & Algorithms Graphs
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
NP-Complete Problems. Running Time v.s. Input Size Concern with problems whose complexity may be described by exponential functions. Tractable problems.
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
Complete Graphs A complete graph is one where there is an edge between every two nodes A C B G.
L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Graph Coloring. Vertex Coloring problem in VLSI routing channels Standard cells Share a track Minimize channel width- assign horizontal Metal wires to.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum.
ELEC692 VLSI Signal Processing Architecture Lecture 3
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
1 The Theory of NP-Completeness 2 Review: Finding lower bound by problem transformation Problem X reduces to problem Y (X  Y ) iff X can be solved by.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
ICS 353: Design and Analysis of Algorithms NP-Complete Problems King Fahd University of Petroleum & Minerals Information & Computer Science Department.
Chapter 10 NP-Complete Problems.
Scheduling Determines the precise start time of each task.
Chap 7. Register Transfers and Datapaths
ICS 353: Design and Analysis of Algorithms
Architectural-Level Synthesis
Architecture Synthesis
ICS 252 Introduction to Computer Design
Resource Sharing and Binding
NP-Complete Problems.
Integrated Systems Centre © Giovanni De Micheli – All rights reserved
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals [Adapted from slides of Prof. G. De Micheli: Synthesis & Optimization of Digital Circuits] Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals [Adapted from slides of Prof. G. De Micheli: Synthesis & Optimization of Digital Circuits]

2 OutlineOutline n Sharing and Binding n Resource-dominated circuits. Flat and hierarchical graphs. n Register sharing n Multi-port memory binding n Bus sharing and binding n Non resource-dominated circuits. n Module selection. n Sharing and Binding n Resource-dominated circuits. Flat and hierarchical graphs. n Register sharing n Multi-port memory binding n Bus sharing and binding n Non resource-dominated circuits. n Module selection.

3 Allocation and Binding n Allocation Number of resources available. n Binding Mapping between operations and resources. n Sharing Assignment of a resource to more than one operation. n Optimum binding/sharing Minimize the resource usage. n Allocation Number of resources available. n Binding Mapping between operations and resources. n Sharing Assignment of a resource to more than one operation. n Optimum binding/sharing Minimize the resource usage.

4 Optimum Sharing Problem n Scheduled sequencing graphs. Operation concurrency well defined. n Consider operation types independently. Problem decomposition. Perform analysis for each resource type. n Minimize resource usage. n Scheduled sequencing graphs. Operation concurrency well defined. n Consider operation types independently. Problem decomposition. Perform analysis for each resource type. n Minimize resource usage.

5 Compatibility and Conflicts n Operation compatibility Same resource type. Non concurrent. n Compatibility graph Vertices: operations. Edges: compatibility relation. n Conflict graph Complement of compatibility graph. n Operation compatibility Same resource type. Non concurrent. n Compatibility graph Vertices: operations. Edges: compatibility relation. n Conflict graph Complement of compatibility graph. Multiplier ALU

6 Algorithmic Solution to the Optimum Binding Problem n Compatibility graph. Partition the graph into a minimum number of cliques. Find clique cover number. n Conflict graph. Color the vertices by a minimum number of colors. Find chromatic number. n NP-complete problems - Heuristic algorithms. n Compatibility graph. Partition the graph into a minimum number of cliques. Find clique cover number. n Conflict graph. Color the vertices by a minimum number of colors. Find chromatic number. n NP-complete problems - Heuristic algorithms.

7 ExampleExample ALU1: 1, 3, 5 ALU2: 2,

8 Perfect Graphs n Comparability graph Graph G(V, E) has an orientation (i.e. directed edges) G(V, F) with the transitive property. (v i, v j )  F  (v j, v k )  F  (v i, v k )  F. n Interval graph Vertices correspond to intervals. Edges correspond to interval intersection. Subset of chordal graphs Every loop with more than three edges has a chord (i.e. an edge joining two non-consecutive vertices in the cycle). n Efficient algorithms exist for coloring and clique partitioning of interval, chordal, and comparability graphs. n Comparability graph Graph G(V, E) has an orientation (i.e. directed edges) G(V, F) with the transitive property. (v i, v j )  F  (v j, v k )  F  (v i, v k )  F. n Interval graph Vertices correspond to intervals. Edges correspond to interval intersection. Subset of chordal graphs Every loop with more than three edges has a chord (i.e. an edge joining two non-consecutive vertices in the cycle). n Efficient algorithms exist for coloring and clique partitioning of interval, chordal, and comparability graphs.

9 Non-Hierarchical Sequencing Graphs n The compatibility/conflict graphs have special properties Compatibility: Comparability graph. Conflict: Interval graph. n Polynomial time solutions Golumbic's algorithm. Left-edge algorithm. n The compatibility/conflict graphs have special properties Compatibility: Comparability graph. Conflict: Interval graph. n Polynomial time solutions Golumbic's algorithm. Left-edge algorithm. Comparability Graph

10 ExampleExample Intervals Corresponding to Conflict Graph

11 Left-Edge Algorithm n Input Set of intervals with left and right edge. n Rationale Sort intervals by left edge. Assign non-overlapping intervals to first color using the sorted list. When possible intervals are exhausted increase color counter and repeat. n Input Set of intervals with left and right edge. n Rationale Sort intervals by left edge. Assign non-overlapping intervals to first color using the sorted list. When possible intervals are exhausted increase color counter and repeat.

12 ExampleExample

13 ILP Formulation of Binding n Boolean variables b ir Operation i bound to resource r. n Boolean variables x il Operation i scheduled to start at step l. n Each operation v i should be assigned to one resource n At most, one operation can be executing, among those assigned to resource r, at any time step n Boolean variables b ir Operation i bound to resource r. n Boolean variables x il Operation i scheduled to start at step l. n Each operation v i should be assigned to one resource n At most, one operation can be executing, among those assigned to resource r, at any time step

14 Example…Example… n Operation types: Multiplier, ALU n Unit execution delay n A feasible binding satisfies constraints n Operation types: Multiplier, ALU n Unit execution delay n A feasible binding satisfies constraints

15 … Example n Constants in X are 0 except x 1,1, x 2,1, x 3,2, x 4,3, x 5,4, x 6,2, x 7,3, x 8,3, x 9,4, x 10,1, x 11,2. n An implementation with a 1 =2 multipliers: n Solutions b 1,1 =1, b 2,2 =1, b 3,1 =1, b 6,2 =1, b 7,1 =1, b 8,2 =1. n Constants in X are 0 except x 1,1, x 2,1, x 3,2, x 4,3, x 5,4, x 6,2, x 7,3, x 8,3, x 9,4, x 10,1, x 11,2. n An implementation with a 1 =2 multipliers: n Solutions b 1,1 =1, b 2,2 =1, b 3,1 =1, b 6,2 =1, b 7,1 =1, b 8,2 =1.

16 Hierarchical Sequencing Graphs … n Hierarchical conflict/compatibility graphs. Easy to compute. Prevent sharing across hierarchy. n Flatten hierarchy. Bigger graphs. Destroy nice properties. Graphs may no longer have special properties i.e., comparability graph, interval graph. Clique partitioning and vertex coloring intractable problems. n Hierarchical conflict/compatibility graphs. Easy to compute. Prevent sharing across hierarchy. n Flatten hierarchy. Bigger graphs. Destroy nice properties. Graphs may no longer have special properties i.e., comparability graph, interval graph. Clique partitioning and vertex coloring intractable problems.

17 … Hierarchical Sequencing Graphs n Model calls When two link vertices corresponding to different called models are not concurrent Any operation pair of same resource type in the different called models is compatible. Concurrency of called models does not necessarily imply conflicts of operation pairs in the models. n Model calls When two link vertices corresponding to different called models are not concurrent Any operation pair of same resource type in the different called models is compatible. Concurrency of called models does not necessarily imply conflicts of operation pairs in the models.

18 Example: Model Calls n Model a consists of two operations: addition, followed by multiplication n Addition delay is 1, multiplication delay is 2 n Model a consists of two operations: addition, followed by multiplication n Addition delay is 1, multiplication delay is 2

19 Example: Branching Constructs n All operations take 2 time units n Start times: t a =1, t b =3, t c =t d =2 n All operations take 2 time units n Start times: t a =1, t b =3, t c =t d =2

20 Register Binding Problem n Given a schedule Lifetime intervals for variables. Lifetime overlaps. n Conflict graph (interval graph). Vertices  variables. Edges  overlaps. Interval graph. Left-edge algorithm. (Polynomial-time). n Find minimum number of registers storing all the variables. n Compatibility graph (comparability graph). n Given a schedule Lifetime intervals for variables. Lifetime overlaps. n Conflict graph (interval graph). Vertices  variables. Edges  overlaps. Interval graph. Left-edge algorithm. (Polynomial-time). n Find minimum number of registers storing all the variables. n Compatibility graph (comparability graph).

21 ExampleExample n Six intermediate variables that need to be stored in registers {z1, z2, z3, z4, z5, z6} n Six variables can be stored in two registers n Six intermediate variables that need to be stored in registers {z1, z2, z3, z4, z5, z6} n Six variables can be stored in two registers

22 Register Sharing: General Case n Iterative constructs Preserve values across iterations. Circular-arc conflict graph. Coloring is intractable. n Hierarchical graphs General conflict graphs. Coloring is intractable. n Heuristic algorithms. n Iterative constructs Preserve values across iterations. Circular-arc conflict graph. Coloring is intractable. n Hierarchical graphs General conflict graphs. Coloring is intractable. n Heuristic algorithms.

23 ExampleExample n 7 intermediate variables, 3 loop variables, 3 loop invariants n 5 registers suffice to store 10 intermediate loop variables n 7 intermediate variables, 3 loop variables, 3 loop invariants n 5 registers suffice to store 10 intermediate loop variables

24 Example: Variable-Lifetimes and Circular- Arc Conflict Graph

25 Multiport-Memory Binding … n Multi-port memory arrays used to store variables. n Find minimum number of ports to access the required number of variables. n Assuming variables access memory always through the same port Problem reduces to binding variables to ports. Port compatibility/conflict. Similar to resource binding. n Assuming variables can use any port Decision variable x il is TRUE when variable i is accessed at step l. Minimum number of ports n Multi-port memory arrays used to store variables. n Find minimum number of ports to access the required number of variables. n Assuming variables access memory always through the same port Problem reduces to binding variables to ports. Port compatibility/conflict. Similar to resource binding. n Assuming variables can use any port Decision variable x il is TRUE when variable i is accessed at step l. Minimum number of ports

26 … Multiport-Memory Binding n Find maximum number of variables to be stored through a fixed number of ports a. Boolean variables {b i, i = 1, 2, …, n var }: Variable i is stored in array. n The maximum number of variables that can be stored in a multiport-memory with a ports is obtained by: n Find maximum number of variables to be stored through a fixed number of ports a. Boolean variables {b i, i = 1, 2, …, n var }: Variable i is stored in array. n The maximum number of variables that can be stored in a multiport-memory with a ports is obtained by:

27 ExampleExample n One port a = 1 {b2, b4, b8} non-zero. 3 variables stored: {v2, v4, v8}. n Two ports a = 2 6 variables stored: {v2, v4, v5, v10, v12, v14} n Three ports a = 3 9 variables stored: {v1, v2, v4, v6, v8, v10, v12, v13} n One port a = 1 {b2, b4, b8} non-zero. 3 variables stored: {v2, v4, v8}. n Two ports a = 2 6 variables stored: {v2, v4, v5, v10, v12, v14} n Three ports a = 3 9 variables stored: {v1, v2, v4, v6, v8, v10, v12, v13}

28 Bus Sharing and Binding n Busses act as transfer resources that feed data to functional resources. n Find the minimum number of busses to accommodate all data transfers. n Find the maximum number of data transfers for a fixed number of busses. n Similar to memory binding problem. n ILP formulation or heuristic algorithms. n Busses act as transfer resources that feed data to functional resources. n Find the minimum number of busses to accommodate all data transfers. n Find the maximum number of data transfers for a fixed number of busses. n Similar to memory binding problem. n ILP formulation or heuristic algorithms.

29 ExampleExample n One bus 3 variables can be transferred. n Two busses All variables can be transferred. n One bus 3 variables can be transferred. n Two busses All variables can be transferred.

30 Sharing and Binding for General Circuits n Area and delay influenced by Steering logic, wiring, registers and control circuit. E.g. multiplexers area and propagation delays depend on number of inputs. Wire lengths can be derived from statistical models. n Binding affects the cycle-time It may invalidate a schedule. n Control unit is affected marginally by resource binding. n Area and delay influenced by Steering logic, wiring, registers and control circuit. E.g. multiplexers area and propagation delays depend on number of inputs. Wire lengths can be derived from statistical models. n Binding affects the cycle-time It may invalidate a schedule. n Control unit is affected marginally by resource binding.

31 Unconstrained Minimum Area Binding n Area cost function depends on several factors resource count, steering logic and wiring. n In limiting cases, resource sharing may affect adversely circuit area. n Example Circuit with n 1-bit add operations Area of 1-bit adder is area add Area of a MUX is a function of number of inputs area mux = area mux . (i-1), where area mux  is a constant Total area of a binding with a resources is a (area add + area mux )  a (area add - area mux  ) + n. area mux  Area is increasing or decreasing function of a according to relation area add > area mux . n Area cost function depends on several factors resource count, steering logic and wiring. n In limiting cases, resource sharing may affect adversely circuit area. n Example Circuit with n 1-bit add operations Area of 1-bit adder is area add Area of a MUX is a function of number of inputs area mux = area mux . (i-1), where area mux  is a constant Total area of a binding with a resources is a (area add + area mux )  a (area add - area mux  ) + n. area mux  Area is increasing or decreasing function of a according to relation area add > area mux .

32 Unconstrained Minimum Area Binding n Edge-weighted compatibility graph Edge weights represent level of desirability of sharing Clique covering n Edge-weighted compatibility graph Edge weights represent level of desirability of sharing Clique covering

33 Unconstrained Minimum Area Binding n Tseng’s algorithm considers repeatedly subgraphs induced by vertices with same weight edges. n Graphs with decreasing values of weights considered. n Unweighted clique partitioning of subgraphs. n Example Assume following edges have weight of 2 {v1, v3}, {v1, v6}, {v1, v7}, {v3, v7}, {v6, v7} Other edges have weight 1 Clique {v1, v3, v7} is first identified Clique {v2, v6, v8} is then identified n Tseng’s algorithm considers repeatedly subgraphs induced by vertices with same weight edges. n Graphs with decreasing values of weights considered. n Unweighted clique partitioning of subgraphs. n Example Assume following edges have weight of 2 {v1, v3}, {v1, v6}, {v1, v7}, {v3, v7}, {v6, v7} Other edges have weight 1 Clique {v1, v3, v7} is first identified Clique {v2, v6, v8} is then identified

34 Module Selection Problem … n Library of resources More than one resource per type. n Example Adder Ripple-carry adder. Carry look-ahead adder. Multiplier Fully parallel Serial-Parallel Fully serial n Resource modeling Resource subtypes with (area, delay) parameters. n Library of resources More than one resource per type. n Example Adder Ripple-carry adder. Carry look-ahead adder. Multiplier Fully parallel Serial-Parallel Fully serial n Resource modeling Resource subtypes with (area, delay) parameters.

35 … Module Selection Problem n ILP formulation Decision variables b jr Select resource sub-type. Determine (area, delay). n Heuristic algorithms Determine minimum latency with fastest resource subtypes. Recover area by using slower resources on non- critical paths. n ILP formulation Decision variables b jr Select resource sub-type. Determine (area, delay). n Heuristic algorithms Determine minimum latency with fastest resource subtypes. Recover area by using slower resources on non- critical paths.

36 ExampleExample n Multipliers with (Area, delay) = (5,1) and (2,2) n ALU with (Area, delay) = (1,1) n Latency bound of 5. n Area cost is 7+2=9 n Multipliers with (Area, delay) = (5,1) and (2,2) n ALU with (Area, delay) = (1,1) n Latency bound of 5. n Area cost is 7+2=9

37 ExampleExample n Latency bound of 4. Fast multipliers for {v1, v2, v3}. Slower multipliers can be used elsewhere. Less sharing. Assume v8 uses a slow multiplier: Area=12+2=14 n Minimum-area design uses fast multipliers only. Area=10+2=12 n Latency bound of 4. Fast multipliers for {v1, v2, v3}. Slower multipliers can be used elsewhere. Less sharing. Assume v8 uses a slow multiplier: Area=12+2=14 n Minimum-area design uses fast multipliers only. Area=10+2=12