Simulated-Annealing-Based Solution By Gonzalo Zea s031418 Shih-Fu Liu s031003.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
DSPs Vs General Purpose Microprocessors
ECE 667 Synthesis and Verification of Digital Circuits
CSCI 4717/5717 Computer Architecture
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Courseware Integer Linear Programming approach to Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark.
High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.
Chapter 9 Computer Design Basics. 9-2 Datapaths Reminding A digital system (or a simple computer) contains datapath unit and control unit. Datapath: A.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
Parallell Processing Systems1 Chapter 4 Vector Processors.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 10: RC Principles: Software (3/4) Prof. Sherief Reda.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Courseware Path-Based Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
CSE241 RTL Performance.1Kahng & Cichy, UCSD ©2003 CSE241A VLSI Digital Circuits Winter 2003 Recitation 2.5: Performance Coding.
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
ICS 252 Introduction to Computer Design
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Parallelism Processing more than one instruction at a time. Pipelining
CSE 242A Integrated Circuit Layout Automation Lecture: Partitioning Winter 2009 Chung-Kuan Cheng.
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.
High Performance Scalable Base-4 Fast Fourier Transform Mapping Greg Nash Centar 2003 High Performance Embedded Computing Workshop
CSCI 232© 2005 JW Ryder1 Parallel Processing Large class of techniques used to provide simultaneous data processing tasks Purpose: Increase computational.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
UNIT 1 Introduction. 1-2 OutlineOutline n Course Topics n Microelectronics n Design Styles n Design Domains and Levels of Abstractions n Digital System.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics Basics of register-transfer design: –data paths and controllers; –ASM charts. Pipelining.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Divide Calculation Latency
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Muhammad Elrabaa Computer Engineering Department King Fahd University of Petroleum.
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
03/30/031 ECE Digital System Design & Synthesis Lecture Design Partitioning for Synthesis Strategies  Partition for design reuse  Keep related.
Carnegie Mellon Lecture 8 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm Reading: Chapter 10.5 – 10.6 M. LamCS243: Software.
Processor Organization and Architecture Module III.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
Final Project Report 64 points FFT
Register Transfer Specification And Design
ECE 565 High-Level Synthesis—An Introduction
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
Architectural-Level Synthesis
ICS 252 Introduction to Computer Design
Synthesis of Motion from Simple Animations
Pipelining.
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s031003

Simulated-Annealing-Based Solution 2 Agenda Hardware Allocation Problem –Input Description –Basic Allocation Problem –Subproblem –Formulation of the entire data path synthesis problem –Cost Table –Conditional Resource Sharing Simulated-Annealing-Based Solution –Generating New States –Stopping criteria –Cost function –Constraints –Delays Loops Result of example Synthesizing Pipelined Data Paths Conclusions – Agenda – Hardware Allocation Problem

Simulated-Annealing-Based Solution 3 Hardware Allocation Problem execution speed of the data path (T) total hardware cost of the data path (C)  f(T,C) should be minimized Agenda – Hardware Allocation Problem – Simulated-Annealing-Based Solution

Simulated-Annealing-Based Solution 4 Input Description Code sequence where parallelism, sequentiality, and disjointness (mutually exclusive operations) are explicitly stated Compilerlike optimization techniques (e.g. dead code elimination, constant folding) Disjointness is a result of the conditional clauses in the input description Agenda – Hardware Allocation Problem – Simulated-Annealing-Based Solution

Simulated-Annealing-Based Solution 5 Basic Allocation Problem given description allocate into a minimum number of registers Arithmetic unit allocation –entails scheduling operations –minimum numbers of ALU’s –meeting cost or timing constraints Agenda – Hardware Allocation Problem – Simulated-Annealing-Based Solution

Simulated-Annealing-Based Solution 6 Subproblem Abolish fixed code sequence  gaining an extra degree of freedom Disjoint variables share the same register Precedence constraints must be met Agenda – Hardware Allocation Problem – Simulated-Annealing-Based Solution

Simulated-Annealing-Based Solution 7 Formulation of the entire data path synthesis problem Agenda – Hardware Allocation Problem – Simulated-Annealing-Based Solution C = p1 * (#alu) + p2 * (exec_time) + p3 * (#register) + p4 * (#bus) –p1, p3, p4 … area parameters –p2 … execution time parameter By meeting constraints and being minimal, C is optimal

Simulated-Annealing-Based Solution 8 Cost Table Register cost –Equal to the area of the library register cost Costs of ALU operations – non linear function Estimating interconnecting area –Complex function of the number of registers and ALU’s in the data path Agenda – Hardware Allocation Problem – Simulated-Annealing-Based Solution

Simulated-Annealing-Based Solution 9 Conditional Resource Sharing Disjoint statements can exist on top of each other on the same time-space slot  resource sharing Agenda – Hardware Allocation Problem – Simulated-Annealing-Based Solution

Simulated-Annealing-Based Solution 10 Simulated-Annealing-Based Solution Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops

Simulated-Annealing-Based Solution 11 Simulated-Annealing-Based Solution Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops Basic algorithm –Random generation of new states

Simulated-Annealing-Based Solution 12 Simulated-Annealing-Based Solution Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops Acceptance rule of the generated states depending on the temperature T

Simulated-Annealing-Based Solution 13 Simulated-Annealing-Based Solution Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops Number of states generated influences quality and can be defined by user

Simulated-Annealing-Based Solution 14 Simulated-Annealing-Based Solution Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops Most important points –Generation of new states –Optimization of the cost function

Simulated-Annealing-Based Solution 15 Generating New States Interchanging two code operations Displacing a code operation from one location to another Interchanging variables in a symmetric operation Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops

Simulated-Annealing-Based Solution 16 Generating New States cont’d High Temperature –Two numbers (a, b) randomly generated –If (b < number of operations) Interchanging two operations –Violate constraints  variables are interchanged –If (b > number of operations) New random location is generated –If not violate constraints Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops

Simulated-Annealing-Based Solution 17 Generating New States cont’d Low Temperature –Two numbers (a, b) randomly generated –If (b < number of operations) Interchanging neighboring operations –Violate constraints  variables are interchanged –If (b > number of operations) Displacement with neighboring operations in time or space slots in random order Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops

Simulated-Annealing-Based Solution 18 Stopping Criteria Cost function stays the same for three temperature points. Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops

Simulated-Annealing-Based Solution 19 Cost function Depends on –Number of registers –Interconnection costs Links Buses Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops

Simulated-Annealing-Based Solution 20 Constraints Hardware Resource –Number of ALU’s & Registers Execution Time Hardware Allocation Problem – Simulated- Annealing-Based Solution – Loops

Simulated-Annealing-Based Solution 21 Delays Simulated-Annealing-Based Solution – Loops – Synthesizing Pipelined Data Paths Highest common factor of all different operation delays equals one time frame Interchanges or displacements of operations affects the time position

Simulated-Annealing-Based Solution 22 Loops Simulated-Annealing-Based Solution – Loops – Synthesizing Pipelined Data Paths Unwinding depends on disjointness Improving of execution time space time

Simulated-Annealing-Based Solution 23 Results of Examples HAL –Clock cycles : 17 –Multipliers : 3 –Adder : 3 Simulated-Annealing-Based Solution – Loops – Synthesizing Pipelined Data Paths SAB-Solution –Clock cycles : 17 –Multipliers : 2 –Adder : 3 –Calculation time increases quadratically

Simulated-Annealing-Based Solution 24 Synthesizing Pipelined Data Paths Pipelining –Inserting registers between logic modules –Increasing latency –Improving throughput Pipeline Synthesis –Partitioning input data flow description into pipeline stages –Finding a placement of micro-operations within each stage for meeting constraints Loops – Pipelined Data Paths - Conclusion

Simulated-Annealing-Based Solution 25 Synthesizing Pipelined Data Paths Algorithm –Serial pipeline schedule Doesn’t violate delay constraints If max. delay exceeded  separating into a new stage Each stage placement problem is treated separately and afterwards summed up Loops – Pipelined Data Paths - Conclusion

Simulated-Annealing-Based Solution 26 Synthesizing Pipelined Data Paths Algorithm –Interchanging and displacement Moving operations within adjacent stages Constraint violation allowed with penalization –Doesn’t appear in the final result Displacing last phase operations to the empty stages Loops – Pipelined Data Paths - Conclusion

Simulated-Annealing-Based Solution 27 Synthesizing Pipelined Data Paths Algorithm –Throughput k … number of stages d i … delay of each stage ρ … expected resynchronization rate Loops – Pipelined Data Paths - Conclusion

Simulated-Annealing-Based Solution 28 Conclusion Entire allocation process  two- dimensional placement problem Simultaneously cost-constrained allocation of hw resources and execution time Trade-off hardware cost against execution speed Pipelined Data Paths – Conclusion -

Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s031003