CAD for VLSI Ramakrishna Lecture#2.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Evaluation of Abstraction Techniques. Uses for the complexity metrics in our framework Comparing the complexity of the reference model with the abstracted.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Give qualifications of instructors: DAP
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
CS 151 Digital Systems Design Lecture 37 Register Transfer Level
1 Hardware description languages: introduction intellectual property (IP) introduction to VHDL and Verilog entities and architectural bodies behavioral,
Design Flow – Computation Flow. 2 Computation Flow For both run-time and compile-time For some applications, must iterate.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
System Partitioning Kris Kuchcinski
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Modeling and the simulator of Digital Circuits in Object-Oriented Programming Stefan Senczyna Department of Fundamentals of Technical Systems The Silesian.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
CS61C L20 Introduction to Synchronous Digital Systems (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.
VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Design methodology.
Electronic Design Automation. Course Outline 1.Digital circuit design flow 2.Verilog Hardware Description Language 3.Logic Synthesis –Multilevel logic.
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
CMSC 345 Fall 2000 Unit Testing. The testing process.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
1 Digital System Design Subject Name : Digital System Design Course Code : IT- 308 Instructor : Amit Prakash Singh Home page :
Principles Of Digital Design Chapter 1 Introduction Design Representation Levels of Abstraction Design Tasks and Design Processes CAD Tools.
CAD for Physical Design of VLSI Circuits
Computer Architecture. “The design of a computer system. It sets the standard for all devices that connect to it and all the software that runs on it.
Efficient Mapping onto Coarse-Grained Reconfigurable Architectures using Graph Drawing based Algorithm Jonghee Yoon, Aviral Shrivastava *, Minwook Ahn,
CMOS Design Methods.
1 H ardware D escription L anguages Modeling Digital Systems.
High Performance Scalable Base-4 Fast Fourier Transform Mapping Greg Nash Centar 2003 High Performance Embedded Computing Workshop
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
Lecture 2 1 ECE 412: Microcomputer Laboratory Lecture 2: Design Methodologies.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
Introduction to VLSI Design – Lec01. Chapter 1 Introduction to VLSI Design Lecture # 11 High Desecration Language- Based Design.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
ECE-C662 Lecture 2 Prawat Nagvajara
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
04/26/20031 ECE 551: Digital System Design & Synthesis Lecture Set : Introduction to VHDL 12.2: VHDL versus Verilog (Separate File)
Ramakrishna Lecture#2 CAD for VLSI Ramakrishna
Outline Motivation and Contributions Related Works ILP Formulation
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Architecture and Synthesis for Multi-Cycle Communication
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Life Cycle Models PPT By :Dr. R. Mall.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
White-Box Testing.
Parallel Programming with MPI and OpenMP
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
White-Box Testing.
Unified Modeling Language
ECE 551: Digital System Design & Synthesis
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
CSCE569 Parallel Computing
HIGH LEVEL SYNTHESIS.
Control Flow Analysis (Chapter 7)
Review of Week 1 Database DBMS File systems vs. database systems
ICS 252 Introduction to Computer Design
ARM ORGANISATION.
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Presentation transcript:

CAD for VLSI Ramakrishna Lecture#2

© Ramakrishna2 Lecture#2 Outline Hardware Modeling Reference G DeMicheli “Synthesis and Optimization of Digital Circuits” – Ch3 General literature

© Ramakrishna3 Typical Synthesis Output int main( void ){ … for( i = 0; i < 8; ++i ) { for( j = 0; j < 8; ++j ) { /* Computation */ } … } /* main() */ Controller Datapath

© Ramakrishna4 Design Flow and CAD

© Ramakrishna5 Hardware Modeling Abstraction Shows relevant features without associated details. It is easy to reason their properties. Easy to handle. Modeling differs depending on the abstraction level E.g. Architectural, logic, geometric Behavioral, structural, physical Language, diagram, mathematical model

© Ramakrishna6 Gajski’s Y-chart

© Ramakrishna7 Abstraction for Modeling

© Ramakrishna8 Structural Modeling The system structure is a collection of interconnected components, that are recursively divided into sub- components. Cells, Entities, blocks, modules, macros elements ….. These are one and the same:

© Ramakrishna9 Control Intensive Designs < X- > X+ J J C1 C2 C1 = 0, C2 = 0 reduction C1 = 0, C2 = 1 no reduction C1 = 1, C2 = 0 no reduction C1 = 1, C2 = 1 reduction Clock On Legend Cx = Conditional Statements J = Join

© Ramakrishna10 for( v = 0; v < N; ++v ) { for( i = 0; i < N; ++i ) { temp = 0.0; for( j = 0; j < M; ++j ) { temp += (Coeff[i][j] * Input[v][j]); } DCT1D_Result[v][i] = temp;} }

© Ramakrishna11 The Algorithm Head Tail Set 1: All Vertices Reachable from Head (BLACK) Run DFS from Tail Set2 : All the Vertices Reachable from Tail Reverse the graph (copy) Run Depth First Search (DFS) from Head

© Ramakrishna12 The Algorithm Head Set 1: All Vertices Reachable from Head (BLACK) Tail Run DFS from Tail Set2 : All the Vertices Reachable from Tail (Tan) Intersect 2 Sets Reverse the graph (copy) Run Depth First Search (DFS) from Head

© Ramakrishna13 Outermost nesting Entry Point 2 nd level nesting entry point 3rd level nesting entry point 4th level nesting entry point Exit Point 4 Exit Point 3 Exit Point 2 Outermost Exit Point

© Ramakrishna14 Outermost nesting Entry Point 2 nd level nesting entry point 3rd level nesting entry point 4th level nesting entry point Exit Point 4 Exit Point 3 Exit Point 2 Outermost Exit Point

© Ramakrishna15 Outermost nesting Entry Point 2 nd level nesting entry point 3rd level nesting entry point 4th level nesting entry point Exit Point 4 Exit Point 3 Exit Point 2 Outermost Exit Point

© Ramakrishna16 Algorithmic Details 1.Identify Loops and their sizes (Partition1) 2.Find out registers being shared among them 3.Use Profiling Data to group “things” along the most frequent pathsUse Profiling Data to group “things” along the most frequent paths things = operators, registers 4.Mutual exclusiveness for Resource Sharing (Scheduler’s headache actually) 5.Memory Accesses.

© Ramakrishna17 Cost Function Multi Valued CF being considered for partitioning. Regression based analysis will give better CF, but iterations Vs granularity traded for the CF. Metrics: –Latency Latency termed in control steps rather than the actual scale/clock. Min latency is calculated before partitioning by using the profiling information. [ASAP/ALAP/Left_Edge Algorithms] Σ i=1…n #instances i * Latency i –Area Σ i=1…n #instances i * Area i –Power Σ i=1…n #instances i * Power i RC delays are not considered in this and assuming to be minimal and impact to the overall CF is less.

© Ramakrishna18 Conclusions Strategies for optimizing has to be user driven through some constraints Power poses a unique challenge in terms of the design realization. Design partitioning can improve the overall cost in terms of latency, area & power.