Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.

Slides:



Advertisements
Similar presentations
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Advertisements

CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.
ECE 667 Synthesis and Verification of Digital Circuits
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 06: Verilog (2/3) Prof. Sherief Reda Division of.
ECE-777 System Level Design and Automation Hardware/Software Co-design
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 07: Verilog (3/3) Prof. Sherief Reda Division of.
Example of Scheduling and Allocation based on Jaap Hofstede IIR Filter.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.
COE 561 Digital System Design & Synthesis Scheduling Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Register-transfer Design n Basics of register-transfer design: –data paths and controllers.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE Synthesis & Verification - Lecture 3 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 17: Application-Driven Hardware Acceleration (3/4)
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 09: RC Principles: Software (2/4) Prof. Sherief Reda.
Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Courseware Force-Directed Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
Reconfigurable Computing (EN2911X, Fall07)
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
S. Reda EN1600 SP’08 Design and Implementation of VLSI Systems (EN1600) Lecture 25: Datapath Subsystems 1/4 Prof. Sherief Reda Division of Engineering,
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
SCHEDULING SOURCES- Mark Manwaring Kia Bazargan Giovanni De Micheli Gupta Youn-Long Lin M. Balakrishnan Camposano, J. Hofstede, Knapp, MacMillen Lin.
Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
Fall 2006EE VLSI Design Automation I VII-1 EE 5301 – VLSI Design Automation I Kia Bazargan University of Minnesota Part VII: High Level Synthesis.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.
High Performance Scalable Base-4 Fast Fourier Transform Mapping Greg Nash Centar 2003 High Performance Embedded Computing Workshop
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
High-Level Synthesis-II Virendra Singh Indian Institute of Science Bangalore IEP on Digital System IIT Kanpur.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 03, 2005 Session 15.
A four function ALU A 00 ADD B MUX SUB 11 Result AND OR
Gradual Relaxation Techniques with Applications to Behavioral Synthesis Zhiru Zhang, Yiping Fan, Miodrag Potkonjak, Jason Cong Department of Computer Science.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
1 TCOM 5143 Lecture 10 Centralized Networks: Time Delay and Cost Tradeoffs.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Scheduling Determines the precise start time of each task.
Register Transfer Specification And Design
High-Level Synthesis Creating Custom Circuits from High-Level Code
Reconfigurable Computing
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
High-Level Synthesis: Creating Custom Circuits from High-Level Code
Architectural-Level Synthesis
Architecture Synthesis
Resource Sharing and Binding
Integrated Systems Centre © Giovanni De Micheli – All rights reserved
ICS 252 Introduction to Computer Design
HIGH LEVEL SYNTHESIS: Estimations and Transformations
ICS 252 Introduction to Computer Design
Reconfigurable Computing (EN2911X, Fall07)
Reconfigurable Computing (EN2911X, Fall07)
Reconfigurable Computing (EN2911X, Fall07)
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda Division of Engineering, Brown University [Some examples are based on G. De Micheli textbook and lectures]

Reconfigurable Computing S. Reda, Brown University Behavioral synthesis Given: a sequencing graph (data/control flow graph) that is constructed from the circuit behavioral circuit specification after code optimizations a set of functional resources (multipliers, adders, …, etc) each characterized in terms of area, delay and power a set of constraint (on circuit delay, area and power) Synthesizing the output circuit consists of two stages: (1)Place operations in time (scheduling) and space (bind them to resources) (2)Determine the detailed connection of the data path the control unit

Reconfigurable Computing S. Reda, Brown University Scheduling (temporal assignment) Scheduling is the task of determining the start times of all operations, subject to the precedence constraints specified by the sequencing graph The latency of the sequencing graph is the difference between the start time of the sink and the start time of the source

Reconfigurable Computing S. Reda, Brown University Scheduling to minimize the latency read (x, y, u, dx, a); do { xl = x + dx; ul = u – (3*x*u*dx) – (3*y*dx); yl = y + u*dx; c = xl < a; x = x1; u = u; y = yl; } while (c); write(y); Consider the following differential equation integrator

Reconfigurable Computing S. Reda, Brown University ASAP scheduling for minimum latency Assuming all operations to have 1 unit delay, what is the latency here?

Reconfigurable Computing S. Reda, Brown University ASAP scheduling algorithm

Reconfigurable Computing S. Reda, Brown University ALAP scheduling to meet latency constraint

Reconfigurable Computing S. Reda, Brown University ALAP scheduling algorithm

Reconfigurable Computing S. Reda, Brown University Operation mobility The mobility of an operation corresponds to the difference of the start time computed between the ALAP and ASAP algorithms Mobility measure the freedom we have in scheduling an operation to meet the timing schedule

Reconfigurable Computing S. Reda, Brown University Resource binding (spatial assignment) Binding determines the resource type and instance assigned for each operation How many multipliers do we need here? how many ALUs (+, -, <)?

Reconfigurable Computing S. Reda, Brown University Resource sharing and binding Bind a resource to two operations as long as they do not execute concurrently How many instances of the multiplier and the ALU do we need now?

Reconfigurable Computing S. Reda, Brown University Can we do better? Can we get the same latency with less resources Resources sharing the same instance are colored with the same color. How many instances are now needed? How can we find the solution?

Reconfigurable Computing S. Reda, Brown University Finding the minimal number of resources for a given latency (T) using list scheduling Initialize all resource instances to 1. for t = 1 to T: For each resource type: –Calculate the slack (ALAP time – t) of each candidate operation –Schedule candidate operations with zero slack and update the number of resource instances used if needed –Schedule any candidate operations requiring no other resource instances What is the intuition behind this heuristic?

Reconfigurable Computing S. Reda, Brown University Scheduling and sharing necessitates a control unit that orchestrates the sequencing of operations

Reconfigurable Computing S. Reda, Brown University Scheduling under resource constraint Assume we just one instance of a multiplier and one instance of an ALU (+, - and ==), how can we schedule all operations? What is the latency?

Reconfigurable Computing S. Reda, Brown University Finding the minimal latency for a given resource constraint (C) using list scheduling Label all operations by the length of their longest path to the sink and rank them in decreasing order Repeat –For each resource type Determine the candidate operations that U can be scheduled Select a subset of U by priority such that the resource constraint usage (C) is not exceeded –Increment time What is the intuition behind this heuristic?

Reconfigurable Computing S. Reda, Brown University There is an inherent tradeoff between area and latency latency area (4, 6) dominated! (7, 2) (4, 4)

Reconfigurable Computing S. Reda, Brown University Control unit example for(i=0; i<10; i=i+1) begin x = a[i] + b[i]; z = z + x; end a b x z i Control unit CMP 10 Enable + MUX 1 How many control signals are produced from the control unit? How can we design the control unit? counter control bits

Reconfigurable Computing S. Reda, Brown University Summary So far, we covered –SW/HW partitioning –Behavioral code optimization –Behavioral synthesis techniques Next time, I give an overview of –Technology mapping –Placement and routing