Architectural-Level Synthesis

Slides:



Advertisements
Similar presentations
1 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
ECE 667 Synthesis and Verification of Digital Circuits
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Computer Architecture Lecture 7 Compiler Considerations and Optimizations.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 09: RC Principles: Software (2/4) Prof. Sherief Reda.
Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
ICS 252 Introduction to Computer Design
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
High-level Synthesis and System Synthesis SOURCES- Mark Manwaring Kia Bazargan Giovanni De Micheli Gupta Youn-Long Lin Camposano, J. Hofstede, Knapp, MacMillen.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Actual design flows and tools.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE 2372 Modern Digital System Design
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
Automated Design of Custom Architecture Tulika Mitra
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
UNIT 1 Introduction. 1-2 OutlineOutline n Course Topics n Microelectronics n Design Styles n Design Domains and Levels of Abstractions n Digital System.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
High-Level Synthesis: Creating Custom Circuits from High-Level Code Greg Stitt ECE Department University of Florida.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
ECE-C662 Lecture 2 Prawat Nagvajara
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
VHDL and Hardware Tools CS 184, Spring 4/6/5. Hardware Design for Architecture What goes into the hardware level of architecture design? Evaluate design.
R-Verify: Deep Checking of Embedded Code James Ezick † Donald Nguyen † Richard Lethin † Rick Pancoast* (†) Reservoir Labs (*) Lockheed Martin The Eleventh.
L9 : Low Power DSP Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab.
EECE 320 L8: Combinational Logic design Principles 1Chehab, AUB, 2003 EECE 320 Digital Systems Design Lecture 8: Combinational Logic Design Principles.
Code Optimization.
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
ASIC Design Methodology
A Simple Syntax-Directed Translator
EEE2135 Digital Logic Design Chapter 1. Introduction
Topics Modeling with hardware description languages (HDLs).
课程名 编译原理 Compiling Techniques
Optimization Code Optimization ©SoftMoore Consulting.
High-Level Synthesis Creating Custom Circuits from High-Level Code
Topics Modeling with hardware description languages (HDLs).
Introduction to cosynthesis Rabi Mahapatra CSCE617
ECE 551: Digital System Design & Synthesis
COE 561 Digital System Design & Synthesis Introduction
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
Modeling Languages and Abstract Models
High-Level Synthesis: Creating Custom Circuits from High-Level Code
A SoC Design Automation Seoul National University
Design Technologies for Integrated Systems
Architecture Synthesis
HIGH LEVEL SYNTHESIS.
Resource Sharing and Binding
Integrated Systems Centre © Giovanni De Micheli – All rights reserved
ICS 252 Introduction to Computer Design
COE 561 Digital System Design & Synthesis Introduction
*Internal Synthesizer Flow *Details of Synthesis Steps
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long as this note and the copyright footers are not removed © Giovanni De Micheli – All rights reserved

(c) Giovanni De Micheli Synthesis Transform behavioral into structural view Architectural-level synthesis: Architectural abstraction level Determine macroscopic structure Example: major building blocks Logic-level synthesis: Logic abstraction level Determine microscopic structure Example: logic gate interconnection (c) Giovanni De Micheli

Models and flows Verilog VHDL SystemC Esterel Statecharts Schematics LANGUAGE MODELS ABSTRACT MODELS Verilog VHDL SystemC compilation Operations and dependencies (Data-flow & sequencing graphs) HDL ARCHITECTURAL LEVEL BEHAVIORAL VIEW Esterel Statecharts compilation HDL FSMs – Logic functions (State-diagrams & logic networks) LOGIC LEVEL Schematics STRUCTURAL VIEW translation Interconnected logic blocks (Logic networks) HDL GDS2 Physical design (mask layout) (c) Giovanni De Micheli

(c) Giovanni De Micheli Example diffeq { read (x; y; u; dx; a); repeat { xl = x+dx; ul = u –(3 . x . u . dx) – (3 . y . dx) yl = y + u . dx ; c = x < a; X = xl; u = ul; y = yl; } until ( c ) write (y) (c) Giovanni De Micheli

(c) Giovanni De Micheli Example of structures * ALU STEERING & MEMORY CONTROL UNIT STEERING & MEMORY CONTROL UNIT * ALU * ALU (c) Giovanni De Micheli

(c) Giovanni De Micheli Example Area 15 (2,2) 13 (2,1) 12 10 X (1,2) 8 7 (1,1) 5 Latency 1 2 3 4 5 6 7 8 (c) Giovanni De Micheli

Architectural-level synthesis motivation Raise input abstraction level Reduce specification of details Extend designer base Self-documenting design specifications Ease modifications and extensions Reduce design time Explore and optimize macroscopic structure: Series/parallel execution of operations (c) Giovanni De Micheli

Architectural-level synthesis Translate HDL models into sequencing graphs Behavioral-level optimization: Optimize abstract models independently from the implementation parameters Architectural synthesis and optimization: Create macroscopic structure: Data-path and control-unit Consider area and delay information of the implementation (c) Giovanni De Micheli

Compilation and behavioral optimization Software compilation: Compile program into intermediate form Optimize intermediate form Generate target code for an architecture Hardware compilation: Compile HDL model into sequencing graph Optimize sequencing graph Generate gate-level interconnection for a cell library (c) Giovanni De Micheli

Hardware and software compilation lex parse optimization codegen front-end Intermediate form back-end lex parse behavioral optimization front-end Intermediate form back-end a-synthesis l-synthesis l-binding (c) Giovanni De Micheli

(c) Giovanni De Micheli Compilation Front-end: Lexical and syntax analysis Parse-tree generation Macro-expansion Expansion of meta-variables Semantic analysis: Data-flow and control-flow analysis Type checking Resolve arithmetic and relational operators (c) Giovanni De Micheli

(c) Giovanni De Micheli Parse tree example a = p + q * r assignment = expression identifier + a expression identifier * p identifier identifier q r (c) Giovanni De Micheli

Behavioral-level optimization Semantic-preserving transformations aiming at simplifying the model Applied to parse-trees or during their generation Taxonomy: Data-flow based transformations Control-flow based transformations (c) Giovanni De Micheli

Data-flow based transformations Tree-height reduction Constant and variable propagation Common sub-expression elimination Dead-code elimination Operator-strength reduction Code motion (c) Giovanni De Micheli

Tree-height reduction Applied to arithmetic expressions Goal: Split into two-operand expressions to exploit hardware parallelism at best Techniques: Balance the expression tree Exploit commutativity, associativity and distributivity (c) Giovanni De Micheli

Example of tree-height reduction using commutativity and associativity + + + + * * a b c d a d b c x = a + b * c + d → x = (a + d) + b * c (c) Giovanni De Micheli

Example of tree-height reduction using distributivity * + + * * * * * * a b c d e a b c d a e x = a * (b * c * d + e) → x = a * b * c * d + a * e; (c) Giovanni De Micheli

Examples of propagation Constant propagation a = 0; b = a + 1; c = 2 * b; a = 0; b = 1; c = 2; Variable propagation: a = x; b = a + 1; c = 2 * x; a = x; b = x + 1; c = 2 * x; (c) Giovanni De Micheli

Sub-expression elimination Logic expressions: Performed by logic optimization Kernel-based methods Arithmetic expressions: Search isomorphic patterns in the parse trees Example: a = x + y; b = a +1; c = x + y a = x + y; b = a + 1; c = a; (c) Giovanni De Micheli

Examples of other transformations Dead-code elimination: a = x; b = x + 1; c = 2 * x; a = x; can be removed if not referenced Operator-strength reduction: a = x2, b = 3 * x; a = x * x; t = x << 1; b = x + t; Code motion: for ( i = 1; i < a * b) { } t = a * b; for ( i = 1; i < t) { } (c) Giovanni De Micheli

Control-flow based transformations Model expansion Conditional expansion Loop expansion Block-level transformations (c) Giovanni De Micheli

(c) Giovanni De Micheli Model expansion Expand subroutine Flatten hierarchy Expand scope of other optimization techniques Problematic when model is called more than once Example: x = a + b; y = a * b; z = foo (x , y); foo(p,q) { t=q - p; return (t); } By expanding foo: x = a + b; y = a*b; z = y – x; (c) Giovanni De Micheli

Conditional expansion Transform conditional into parallel execution with test at the end Useful when test depends on late signals May preclude hardware sharing Always useful for logic expressions Example: y = ab; if (a) {x = b + d; } else { x = bd; } Can be expanded to: x = a (b + d) + a’bd And simplified as: y = ab; x = y + d ( a + b ) (c) Giovanni De Micheli

(c) Giovanni De Micheli Loop expansion Applicable to loops with data-independent exit conditions Useful to expand scope of other optimization techniques Problematic when loop has many iterations Example: x = 0; for ( I = 1; I < 3; I ++ ) { x = x + 1; } Expanded to: x = 0; x = x + 1; x = x + 2; x = x + 3 (c) Giovanni De Micheli

Architectural synthesis and optimization Synthesize macroscopic structure in terms of building-blocks Explore area/performance trade-off: maximize performance implementations subject to area constraints minimize area implementations subject to performance constraints Determine an optimal implementation Create logic model for data-path and control (c) Giovanni De Micheli

Design space and objectives Set of all feasible implementations Implementation parameters: Area Performance: Cycle-time Latency Throughput (for pipelined implementations) Power consumption (c) Giovanni De Micheli

Design evaluation space Area Area Area Area Max Cycle-time Latency Latency Latency Latency Max (c) Giovanni De Micheli

(c) Giovanni De Micheli Hardware modeling Circuit behavior: Sequencing graphs Building blocks: Resources Constraints: Timing and resource usage (c) Giovanni De Micheli

(c) Giovanni De Micheli Resources Functional resources: Perform operations on data Example: arithmetic and logic blocks Storage resources: Store data Example: memory and registers Interface resources: Example: busses and ports (c) Giovanni De Micheli

Resources and circuit families Resource-dominated circuits. Area and performance depend on few, well-characterized blocks Example: DSP circuits Non resource-dominated circuits Area and performance are strongly influenced by sparse logic, control and wiring Example: some ASIC circuits (c) Giovanni De Micheli

Implementation constraints Timing constraints: Cycle-time Latency of a set of operations Time spacing between operation pairs Resource constraints: Resource usage (or allocation) Partial binding (c) Giovanni De Micheli

Synthesis in the temporal domain Scheduling: Associate a start-time with each operation Determine latency and parallelism of the implementation Scheduled sequencing graph: Sequencing graph with start-time annotation (c) Giovanni De Micheli

(c) Giovanni De Micheli Example NOP 1 2 6 8 10 * * * * + TIME 1 TIME 2 TIME 3 TIME 4 3 7 9 11 * * + < 4 - 5 - NOP n (c) Giovanni De Micheli

Synthesis in the spatial domain Binding: Associate a resource with each operation with the same type Determine the area of the implementation Sharing: Bind a resource to more than one operation Operations must not execute concurrently Bound sequencing graph: Sequencing graph with resource annotation (c) Giovanni De Micheli

(c) Giovanni De Micheli Example NOP (1,1) (1,2) (1,3) (1,4) (2,2) 1 2 6 8 * * * * + 10 TIME 1 3 7 9 11 * * + < TIME 2 4 (2,1) - TIME 3 5 - TIME 4 NOP n (c) Giovanni De Micheli

(c) Giovanni De Micheli Estimation Resource-dominated circuits. Area = sum of the area of the resources bound to the operations Determined by binding Latency = start time of the sink operation (minus start time of the source operation) Determined by scheduling Non resource-dominated circuits Area also affected by: Registers, steering logic, wiring and control Cycle-time also affected by: Steering logic, wiring and (possibly) control (c) Giovanni De Micheli

Approaches to architectural optimization Multiple-criteria optimization problem: Area, latency, cycle-time Determine Pareto optimal points: Implementations such that no other has all parameters with inferior values Draw trade-off curves: Discontinuous and highly nonlinear (c) Giovanni De Micheli

Approaches to architectural optimization Area/latency trade-off for some values of the cycle-time. Cycle-time/latency trade-off for some binding (area) Area/cycle-time trade-off for some schedules (latency) (c) Giovanni De Micheli

Area-latency trade-off Rationale: Cycle-time dictated by system constraints Resource-dominated circuits: Area is determined by resource usage Approaches: Schedule for minimum latency under resource usage constraints Schedule for minimum resource usage under latency constraints for varying cycle-time constraints (c) Giovanni De Micheli

Area/latency trade-off (2,2) (2,1) Area 20 (1,2) X (3,2) 18 (3,1) (1,1) 17 15 Cycle-time 13 (2,1) 12 10 40 8 7 5 Latency 30 1 2 3 4 5 6 7 8 (c) Giovanni De Micheli