COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
ECE 667 Synthesis and Verification of Digital Circuits
COE 405 VHDL Basics Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Dr. Aiman H. El-Maleh Computer Engineering.
Lecture 11: Code Optimization CS 540 George Mason University.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
COE 561 Digital System Design & Synthesis Scheduling Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Register-transfer Design n Basics of register-transfer design: –data paths and controllers.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 09: RC Principles: Software (2/4) Prof. Sherief Reda.
Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Architectural and System Synthesis SOURCES- DeMicheli Mark Manwaring Kia Bazargan Giovanni De Micheli Gupta Youn-Long Lin Camposano, J. Hofstede, Knapp,
King Fahd University of Petroleum and Minerals Computer Engineering Department COE 561 Digital Systems Design and Synthesis (Course Activity) Synthesis.
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
ICS 252 Introduction to Computer Design
George Mason University ECE 448 – FPGA and ASIC Design with VHDL Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts,
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Fall 2006EE VLSI Design Automation I VII-1 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part VII: High Level Synthesis.
High-level Synthesis and System Synthesis SOURCES- Mark Manwaring Kia Bazargan Giovanni De Micheli Gupta Youn-Long Lin Camposano, J. Hofstede, Knapp, MacMillen.
COE 405 Introduction to Digital Design Methodology
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Optimization software for apeNEXT Max Lukyanov,  apeNEXT : a VLIW architecture  Optimization basics  Software optimizer for apeNEXT  Current.
Computer Architecture and Organization Introduction.
Automated Design of Custom Architecture Tulika Mitra
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
1 H ardware D escription L anguages Modeling Digital Systems.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
UNIT 1 Introduction. 1-2 OutlineOutline n Course Topics n Microelectronics n Design Styles n Design Domains and Levels of Abstractions n Digital System.
COE 202 Introduction to Verilog Computer Engineering Department College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
ECE-C662 Lecture 2 Prawat Nagvajara
L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
Mapping of Regular Nested Loop Programs to Coarse-grained Reconfigurable Arrays – Constraints and Methodology Presented by: Luis Ortiz Department of Computer.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
L9 : Low Power DSP Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab.
Computer Organization and Architecture Lecture 1 : Introduction
Optimization Code Optimization ©SoftMoore Consulting.
ECE 434 Advanced Digital System L08
COE 561 Digital System Design & Synthesis Introduction
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
Modeling Languages and Abstract Models
A SoC Design Automation Seoul National University
Architectural-Level Synthesis
Architecture Synthesis
HIGH LEVEL SYNTHESIS.
Resource Sharing and Binding
ICS 252 Introduction to Computer Design
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum & Minerals Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum & Minerals

2 OutlineOutline n Motivation n Dataflow graphs n Sequencing graphs n Compilation and behavioral optimization n Resources n Constraints n Synthesis in temporal domain: Scheduling n Synthesis in spatial domain: Binding n Motivation n Dataflow graphs n Sequencing graphs n Compilation and behavioral optimization n Resources n Constraints n Synthesis in temporal domain: Scheduling n Synthesis in spatial domain: Binding

3 SynthesisSynthesis n Transform behavioral into structural view. n Architectural-level synthesis Architectural abstraction level. Determine macroscopic structure. Example: major building blocks like adder, register, mux. n Logic-level synthesis Logic abstraction level. Determine microscopic structure. Example: logic gate interconnection. n Transform behavioral into structural view. n Architectural-level synthesis Architectural abstraction level. Determine macroscopic structure. Example: major building blocks like adder, register, mux. n Logic-level synthesis Logic abstraction level. Determine microscopic structure. Example: logic gate interconnection.

4 Synthesis and Optimization

5 Architectural Design Space Example …

6 Different Design Solutions 1 Multiplier, 1 ALU 2 Multipliers, 2 ALUs

7 Example of Structures

8 Area vs. Latency Tradeoffs Multiplier Area: 5 Adder Area: 1 Other logic Area: 1

9 Architectural-Level Synthesis Motivation n Raise input abstraction level. Reduce specification of details. Extend designer base. Self-documenting design specifications. Ease modifications and extensions. n Reduce design time. n Explore and optimize macroscopic structure Series/parallel execution of operations. n Raise input abstraction level. Reduce specification of details. Extend designer base. Self-documenting design specifications. Ease modifications and extensions. n Reduce design time. n Explore and optimize macroscopic structure Series/parallel execution of operations.

10 Architectural-Level Synthesis n Translate HDL models into sequencing graphs. n Behavioral-level optimization Optimize abstract models independently from the implementation parameters. n Architectural synthesis and optimization Create macroscopic structure data-path and control-unit. Consider area and delay information of the implementation. n Translate HDL models into sequencing graphs. n Behavioral-level optimization Optimize abstract models independently from the implementation parameters. n Architectural synthesis and optimization Create macroscopic structure data-path and control-unit. Consider area and delay information of the implementation.

11 Dataflow Graphs … n Behavioral views of architectural models. n Useful to represent data- paths. n Graph Vertices = operations. Edges = dependencies. n Dependencies arise due Input to an operation is result of another operation. Serialization constraints in specification. Two tasks share the same resource. n Behavioral views of architectural models. n Useful to represent data- paths. n Graph Vertices = operations. Edges = dependencies. n Dependencies arise due Input to an operation is result of another operation. Serialization constraints in specification. Two tasks share the same resource.

12 … Dataflow Graphs n Assumes the existence of variables who store information required and generated by operations. n Each variable has a lifetime which is the interval from birth to death. n Variable birth is the time at which the value is generated. n Variable death is the latest time at which the value is referenced as input to operation. n Values must be preserved during life-time. n Assumes the existence of variables who store information required and generated by operations. n Each variable has a lifetime which is the interval from birth to death. n Variable birth is the time at which the value is generated. n Variable death is the latest time at which the value is referenced as input to operation. n Values must be preserved during life-time.

13 Sequencing Graphs n Useful to represent data-path and control. n Extended dataflow graphs Control Data Flow Graphs (CDFGs). Operation serialization. Hierarchy. Control-flow commands branching and iteration. Polar: source and sink. n Paths in the graph represent concurrent streams of operations. n Useful to represent data-path and control. n Extended dataflow graphs Control Data Flow Graphs (CDFGs). Operation serialization. Hierarchy. Control-flow commands branching and iteration. Polar: source and sink. n Paths in the graph represent concurrent streams of operations.

14 Example of Hierarchy n Two kinds of vertices Operations Links: linking sequencing graph entities in the hierarchy Model call Branching Iteration n Vertex v i is a predecessor of vertex v j if there is a path with tail v i and head v j n Vertex v i is a successor of vertex v j if there is a path with head v i and tail v j n Two kinds of vertices Operations Links: linking sequencing graph entities in the hierarchy Model call Branching Iteration n Vertex v i is a predecessor of vertex v j if there is a path with tail v i and head v j n Vertex v i is a successor of vertex v j if there is a path with head v i and tail v j

15 Example of Branching … n Branching modeled by Branching clause Branching body Set of tasks selected according to value of branching clause. n Several branching bodies Mutual exclusive execution. n A sequencing graph entity associated with each branch body. n Link vertex models Branching clause. Operation of evaluating clause and taking branch decision. n Branching modeled by Branching clause Branching body Set of tasks selected according to value of branching clause. n Several branching bodies Mutual exclusive execution. n A sequencing graph entity associated with each branch body. n Link vertex models Branching clause. Operation of evaluating clause and taking branch decision.

16 … Example of Branching n x= a*b n y=x*c n z=a+b n If (z  0) {p=m+n; q=m*n} n x= a*b n y=x*c n z=a+b n If (z  0) {p=m+n; q=m*n}

17 Iterative Constructs n Iterative constructs modeled by Iteration clause Iteration body n Iteration body is a set of tasks repeated as long as iteration clause is true. n Iteration modeled through use of hierarchy. n Iteration represented as repeated model call to sequencing graph entity modeling iteration body. n Link vertex models the operation of evaluating the iteration cause. n Iterative constructs modeled by Iteration clause Iteration body n Iteration body is a set of tasks repeated as long as iteration clause is true. n Iteration modeled through use of hierarchy. n Iteration represented as repeated model call to sequencing graph entity modeling iteration body. n Link vertex models the operation of evaluating the iteration cause.

18 Example of Iteration …

19 … Example of Iteration Loop Body

20 Semantics of Sequencing Graphs n Marking of vertices Waiting for execution. Executing. Have completed execution. n Firing an operation means starting its execution. n Execution semantics An operation can be fired as soon as all its immediate predecessors have completed execution. n Model can be reset by making all operations waiting for execution. n Model can be fired (executed) by firing the source vertex. n Model completes execution when sink completes execution. n Marking of vertices Waiting for execution. Executing. Have completed execution. n Firing an operation means starting its execution. n Execution semantics An operation can be fired as soon as all its immediate predecessors have completed execution. n Model can be reset by making all operations waiting for execution. n Model can be fired (executed) by firing the source vertex. n Model completes execution when sink completes execution.

21 Vertex Attributes n Area cost. n Delay cost Propagation delay. Execution delay. n Data-dependent execution delays Bounded (e.g. branching). Maximum and minimum delays can be computed E.g. floating-point data normalization requiring conditional data alignment. Unbounded (e.g. iteration, synchronization). n Area cost. n Delay cost Propagation delay. Execution delay. n Data-dependent execution delays Bounded (e.g. branching). Maximum and minimum delays can be computed E.g. floating-point data normalization requiring conditional data alignment. Unbounded (e.g. iteration, synchronization).

22 Properties of Sequencing Graphs n Computed by visiting hierarchy bottom-up. n Area estimate Sum of the area attributes of all vertices. Worst-case -- no sharing. n Delay estimate (latency) Bounded-latency graphs. Length of longest path. n Computed by visiting hierarchy bottom-up. n Area estimate Sum of the area attributes of all vertices. Worst-case -- no sharing. n Delay estimate (latency) Bounded-latency graphs. Length of longest path.

23 Compilation and Behavioral Optimization n Software compilation Compile program into intermediate form. Optimize intermediate form. Generate target code for an architecture. n Hardware compilation Compile HDL model into sequencing graph. Optimize sequencing graph. Generate gate-level interconnection for a cell library. n Software compilation Compile program into intermediate form. Optimize intermediate form. Generate target code for an architecture. n Hardware compilation Compile HDL model into sequencing graph. Optimize sequencing graph. Generate gate-level interconnection for a cell library.

24 Hardware and Software Compilation

25 CompilationCompilation n Front-end Lexical and syntax analysis. Parse-tree generation. Macro-expansion. Expansion of meta-variables. n Semantic analysis Data-flow and control-flow analysis. Type checking. Resolve arithmetic and relational operators. n Front-end Lexical and syntax analysis. Parse-tree generation. Macro-expansion. Expansion of meta-variables. n Semantic analysis Data-flow and control-flow analysis. Type checking. Resolve arithmetic and relational operators.

26 Parse Tree Example n a = p + q * r

27 Behavioral-Level Optimization n Semantic-preserving transformations aiming at simplifying the model. n Applied to parse-trees or during their generation. n Taxonomy Data-flow based transformations. Control-flow based transformations. n Semantic-preserving transformations aiming at simplifying the model. n Applied to parse-trees or during their generation. n Taxonomy Data-flow based transformations. Control-flow based transformations.

28 Data-Flow Based Transformations n Tree-height reduction. n Constant and variable propagation. n Common subexpression elimination. n Dead-code elimination. n Operator-strength reduction. n Code motion. n Tree-height reduction. n Constant and variable propagation. n Common subexpression elimination. n Dead-code elimination. n Operator-strength reduction. n Code motion.

29 Tree-Height Reduction n Applied to arithmetic expressions. n Goal Split into two-operand expressions to exploit hardware parallelism at best. n Techniques Balance the expression tree. Exploit commutativity, associativity and distributivity. n Applied to arithmetic expressions. n Goal Split into two-operand expressions to exploit hardware parallelism at best. n Techniques Balance the expression tree. Exploit commutativity, associativity and distributivity.

30 Example of Tree-Height Reduction using Commutativity and Associativity n x = a + b * c + d => x = (a + d) + b * c

31 Example of Tree-Height Reduction using Distributivity n x = a * (b * c * d + e) => x = a * b * c * d + a * e

32 Examples of Propagation & Subexpression Elimination n Constant propagation a = 0; b = a+1; c = 2 * b; a = 0; b = 1; c = 2; n Variable propagation a = x; b = a+1; c = 2 * a; a = x; b = x+1; c = 2 * x; n Subexpression elimination Search isomorphic patterns in the parse trees. Example a = x+y; b = a+1; c = x+y; a = x+y; b = a+1; c = a; n Constant propagation a = 0; b = a+1; c = 2 * b; a = 0; b = 1; c = 2; n Variable propagation a = x; b = a+1; c = 2 * a; a = x; b = x+1; c = 2 * x; n Subexpression elimination Search isomorphic patterns in the parse trees. Example a = x+y; b = a+1; c = x+y; a = x+y; b = a+1; c = a;

33 Examples of Other Transformations n Dead-code elimination a = x; b = x+1; c = 2 * x; a = x; can be removed if not referenced. n Operator-strength reduction a = x 2 ; b = 3 * x; a = x * x; t = x << 1; b = x+t. n Code motion for (i = 1; i < a * b) { } ; t = a * b; for (i = 1; i < t) { }. n Dead-code elimination a = x; b = x+1; c = 2 * x; a = x; can be removed if not referenced. n Operator-strength reduction a = x 2 ; b = 3 * x; a = x * x; t = x << 1; b = x+t. n Code motion for (i = 1; i < a * b) { } ; t = a * b; for (i = 1; i < t) { }.

34 Control-Flow Based Transformations n Model expansion. n Conditional expansion. n Loop expansion. n Block-level transformations. n Model Expansion Expand subroutine -- flatten hierarchy. Useful to expand scope of other optimization techniques. Problematic when routine is called more than once. Example x = a+b; y = a * b; z = foo(x; y); foo(p; q){ t = q - p; return(t); } By expanding foo x = a+b; y = a * b; z = y - x n Model expansion. n Conditional expansion. n Loop expansion. n Block-level transformations. n Model Expansion Expand subroutine -- flatten hierarchy. Useful to expand scope of other optimization techniques. Problematic when routine is called more than once. Example x = a+b; y = a * b; z = foo(x; y); foo(p; q){ t = q - p; return(t); } By expanding foo x = a+b; y = a * b; z = y - x

35 Conditional Expansion n Transform conditional into parallel execution with test at the end. n Useful when test depends on late signals. n May preclude hardware sharing. n Always useful for logic expressions. n Example If (A>B) { Y= A-B} Else {Y=B-A}. n Example y = ab; if (a) {x = b + d; } else {x = bd; } can be expanded to: x = a (b+d) + a’bd and simplified as: y = ab; x = y + d (a+b) n Transform conditional into parallel execution with test at the end. n Useful when test depends on late signals. n May preclude hardware sharing. n Always useful for logic expressions. n Example If (A>B) { Y= A-B} Else {Y=B-A}. n Example y = ab; if (a) {x = b + d; } else {x = bd; } can be expanded to: x = a (b+d) + a’bd and simplified as: y = ab; x = y + d (a+b)

36 Loop Expansion n Applicable to loops with data-independent exit conditions. n Useful to expand scope of other optimization techniques. n Problematic when loop has many iterations. n Example x = 0; for (i = 1; i  3; i++) {x = x+1; } Expanded to: x = 0; x = x+1; x = x+2; x = x+3 n Applicable to loops with data-independent exit conditions. n Useful to expand scope of other optimization techniques. n Problematic when loop has many iterations. n Example x = 0; for (i = 1; i  3; i++) {x = x+1; } Expanded to: x = 0; x = x+1; x = x+2; x = x+3

37 Architectural Synthesis and Optimization n Synthesize macroscopic structure in terms of building- blocks. n Explore area/performance trade-offs maximum performance implementations subject to area constraints. minimum area implementations subject to performance constraints. n Determine an optimal implementation. n Create logic model for data-path and control. n Synthesize macroscopic structure in terms of building- blocks. n Explore area/performance trade-offs maximum performance implementations subject to area constraints. minimum area implementations subject to performance constraints. n Determine an optimal implementation. n Create logic model for data-path and control.

38 Design Space and Objectives n Design space Set of all feasible implementations. n Implementation parameters Area. Performance Cycle-time, Latency, Throughput (for pipelined implementations). Power consumption. n Design space Set of all feasible implementations. n Implementation parameters Area. Performance Cycle-time, Latency, Throughput (for pipelined implementations). Power consumption.

39 Design Evaluation Space

40 Circuit Specification for Architectural Synthesis n Circuit behavior Sequencing graphs. n Building blocks Resources. Functional resources: process data (e.g. ALU). Memory resources: store data (e.g. Register). Interface resources: support data transfer (e.g. MUX and Buses). n Constraints Interface constraints Format and timing of I/O data transfers. Implementation constraints Timing and resource usage. Area Cycle-time and latency n Circuit behavior Sequencing graphs. n Building blocks Resources. Functional resources: process data (e.g. ALU). Memory resources: store data (e.g. Register). Interface resources: support data transfer (e.g. MUX and Buses). n Constraints Interface constraints Format and timing of I/O data transfers. Implementation constraints Timing and resource usage. Area Cycle-time and latency

41 ResourcesResources n Functional resources: perform operations on data. Example: arithmetic and logic blocks. Standard resources Existing macro-cells. Well characterized (area/delay). Example: adders, multipliers, ALUs, Shifters,... Application-specific resources Circuits for specific tasks. Yet to be synthesized. Example: instruction decoder. n Memory resources: store data. Example: memory and registers. n Interface resources Example: busses and ports. n Functional resources: perform operations on data. Example: arithmetic and logic blocks. Standard resources Existing macro-cells. Well characterized (area/delay). Example: adders, multipliers, ALUs, Shifters,... Application-specific resources Circuits for specific tasks. Yet to be synthesized. Example: instruction decoder. n Memory resources: store data. Example: memory and registers. n Interface resources Example: busses and ports.

42 Resources and Circuit Families n Resource-dominated circuits. Area and performance depend on few, well-characterized blocks. Example: DSP circuits. n Non resource-dominated circuits. Area and performance are strongly influenced by sparse logic, control and wiring. Example: some ASIC circuits. n Resource-dominated circuits. Area and performance depend on few, well-characterized blocks. Example: DSP circuits. n Non resource-dominated circuits. Area and performance are strongly influenced by sparse logic, control and wiring. Example: some ASIC circuits.

43 Synthesis in the Temporal Domain: Scheduling n Scheduling Associate a start-time with each operation. Satisfying all the sequencing (timing and resource) constraint. n Goal Determine area/latency trade-off. Determine latency and parallelism of the implementation. n Scheduled sequencing graph Sequencing graph with start-time annotation. n Unconstrained scheduling. n Scheduling with timing constraints Latency. Detailed timing constraints. n Scheduling with resource constraints. n Scheduling Associate a start-time with each operation. Satisfying all the sequencing (timing and resource) constraint. n Goal Determine area/latency trade-off. Determine latency and parallelism of the implementation. n Scheduled sequencing graph Sequencing graph with start-time annotation. n Unconstrained scheduling. n Scheduling with timing constraints Latency. Detailed timing constraints. n Scheduling with resource constraints.

44 Scheduling … 4 Multipliers, 2 ALUs 1 Multiplier, 1 ALU

45 … Scheduling … Scheduling 2 Multipliers, 3 ALUs 2 Multipliers, 2 ALUs

46 Synthesis in the Spatial Domain: Binding n Binding Associate a resource with each operation with the same type. Determine area of the implementation. n Sharing Bind a resource to more than one operation. Operations must not execute concurrently. n Bound sequencing graph Sequencing graph with resource annotation. n Binding Associate a resource with each operation with the same type. Determine area of the implementation. n Sharing Bind a resource to more than one operation. Operations must not execute concurrently. n Bound sequencing graph Sequencing graph with resource annotation.

47 Example: Bound Sequencing Graph

48 Binding Specification n Mapping from the vertex set to the set of resource instances, for each given type. n Partial binding Partial mapping, given as design constraint. n Compatible binding Binding satisfying the constraints of the partial binding. n Mapping from the vertex set to the set of resource instances, for each given type. n Partial binding Partial mapping, given as design constraint. n Compatible binding Binding satisfying the constraints of the partial binding.

49 Performance and Area Estimation n Resource-dominated circuits Area = sum of the area of the resources bound to the operations. Determined by binding. Latency = start time of the sink operation (minus start time of the source operation). Determined by scheduling n Non resource-dominated circuits Area also affected by registers, steering logic, wiring and control. Cycle-time also affected by steering logic, wiring and (possibly) control. n Resource-dominated circuits Area = sum of the area of the resources bound to the operations. Determined by binding. Latency = start time of the sink operation (minus start time of the source operation). Determined by scheduling n Non resource-dominated circuits Area also affected by registers, steering logic, wiring and control. Cycle-time also affected by steering logic, wiring and (possibly) control.

50 Approaches to Architectural Optimization n Multiple-criteria optimization problem area, latency, cycle-time. n Determine Pareto optimal points Implementations such that no other has all parameters with inferior values. n Draw trade-off curves discontinuous and highly nonlinear. n Area/latency trade-off for some values of the cycle-time. n Cycle-time/latency trade-off for some binding (area). n Area/cycle-time trade-off for some schedules (latency). n Multiple-criteria optimization problem area, latency, cycle-time. n Determine Pareto optimal points Implementations such that no other has all parameters with inferior values. n Draw trade-off curves discontinuous and highly nonlinear. n Area/latency trade-off for some values of the cycle-time. n Cycle-time/latency trade-off for some binding (area). n Area/cycle-time trade-off for some schedules (latency).

51 Area/Latency Trade-off … n Rationale Cycle-time dictated by system constraints. n Resource-dominated circuits Area is determined by resource usage. n General circuits Area and delay affected by registers, steering logic, wiring and control logic. Complex dependency of area and delay on circuit structure. n Scheduling and binding are deeply interrelated. Most approaches perform scheduling before binding (fits well for CPU and DSP designs). Performing binding before scheduling fits control dominated designs. n Approaches Schedule for minimum latency under resource constraints. Schedule for minimum resource usage under latency constraints. n Rationale Cycle-time dictated by system constraints. n Resource-dominated circuits Area is determined by resource usage. n General circuits Area and delay affected by registers, steering logic, wiring and control logic. Complex dependency of area and delay on circuit structure. n Scheduling and binding are deeply interrelated. Most approaches perform scheduling before binding (fits well for CPU and DSP designs). Performing binding before scheduling fits control dominated designs. n Approaches Schedule for minimum latency under resource constraints. Schedule for minimum resource usage under latency constraints.

52 … Area/Latency Trade-off n Areas smaller than 20 units. n Latency less than 8 cycles. n ALU area = 1 unit. n MUL area = 5 units. n Overhead area = 1 unit. n ALU propagation delay 25ns. n MUL propagation delay 35 ns. n Cycle time = 40 ns Resources have unit execution delay. n Cycle time = 30ns MUL has 2 unit execution delay. n Areas smaller than 20 units. n Latency less than 8 cycles. n ALU area = 1 unit. n MUL area = 5 units. n Overhead area = 1 unit. n ALU propagation delay 25ns. n MUL propagation delay 35 ns. n Cycle time = 40 ns Resources have unit execution delay. n Cycle time = 30ns MUL has 2 unit execution delay.