L13 :Lower Power High Level Synthesis(3) 1999. 8 성균관대학교 조 준 동 교수

Slides:



Advertisements
Similar presentations
IO Interfaces and Bus Standards. Interface circuits Consists of the cktry required to connect an i/o device to a computer. On one side we have data bus.
Advertisements

Part 4: combinational devices
EKT 221 : Digital 2 MUX-based Transfer
Combinational Logic.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
8085 processor. Bus system in microprocessor.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
Lower Power Design Guide 성균관대학교 조 준 동 교수
ELEN 468 Lecture 151 ELEN 468 Advanced Logic Design Lecture 15 Synthesis of Language Construct I.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
The Control Unit: Sequencing the Processor Control Unit: –provides control signals that activate the various microoperations in the datapath the select.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Chapter 7. Register Transfer and Computer Operations
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Topics covered: CPU Architecture CSE 243: Introduction to Computer Architecture and Hardware/Software Interface.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
ICS 252 Introduction to Computer Design Fall 2006 Eli Bozorgzadeh Computer Science Department-UCI.
COE 561 Digital System Design & Synthesis Resource Sharing and Binding Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
ECE 565 High-Level Synthesis—An Introduction Shantanu Dutt ECE Dept., UIC.
High-Level Synthesis for Reconfigurable Systems. 2 Agenda Modeling 1.Dataflow graphs 2.Sequencing graphs 3.Finite State Machine with Datapath High-level.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
Verilog Digital System Design Z. Navabi, 2006
Dr. Rabie A. Ramadan Al-Azhar University Lecture 6
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
IAY 0600 Digitaalsüsteemide disain Register Transfer Level Design (GCD example) Lab. 7 Alexander Sudnitson Tallinn University of Technology.
Computer Design Basics
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
Computer Architecture Lecture 09 Fasih ur Rehman.
Topics to be covered : How to model memory in Verilog RAM modeling Register Bank.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
Chap 7. Register Transfers and Datapaths. 7.1 Datapaths and Operations Two types of modules of digital systems –Datapath perform data-processing operations.
HYPER: An Interactive Synthesis Environment for Real Time Applications Introduction to High Level Synthesis EE690 Presentation Sanjeev Gunawardena March.
L16 : Logic Level Design (2) 성균관대학교 조 준 동 교수
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Design of a High-Throughput Low-Power IS95 Viterbi Decoder Xun Liu Marios C. Papaefthymiou Advanced Computer Architecture Laboratory Electrical Engineering.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
EKT 221 : Chapter 4 Computer Design Basics
ALU (Continued) Computer Architecture (Fall 2006).
L12 : Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수
How do you model a RAM in Verilog. Basic Memory Model.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
Sequential Execution Example of three micro-operations in the same clock period.
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
Overview Logistics Last lecture Today HW5 due today
Class Exercise 1B.
Register Transfer Specification And Design
ECE 565 High-Level Synthesis—An Introduction
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Chap 7. Register Transfers and Datapaths
Processor (I).
Introduction to cosynthesis Rabi Mahapatra CSCE617
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
Architecture Synthesis
Resource Sharing and Binding
ECE 352 Digital System Fundamentals
Presentation transcript:

L13 :Lower Power High Level Synthesis(3) 성균관대학교 조 준 동 교수

DFG Restructuring DFG2 DFG2 after redundant operation insertion

Minimizing the bit transitions for constants during Scheduling

Control Synthesis Synthesize circuit that: Executes scheduled operations. Provides synchronization. Supports: Iteration. Branching. Hierarchy. Interfaces.

Allocation ◆ Bind a resource to more than one operation.

Optimum binding

Example

RESOURCE SHARING Parallel vs. time-sharing buses (or execution units) Resource sharing can destroy signal correlations and increase switching activity, should be done between operations that are strongly connected. Map operations with correlated input signals to the same units Regularity: repeated patterns of computation (e.g., (+, * ), ( *,*), (+,>)) simplifying interconnect (busses, multiplexers, buffers)

Datapath interconnections Multiplexer-oriented datapath Bus-oriented datapath

Sequential Execution Example of three micro-operations in the same clock period

Insertion of Latch (out) Insertion of latches at the output ports of the functional units

Insertion of Latch (in/out) Insertion of latches at both the input and output ports of the functional units

Overlapping Data Transfer(in) Overlapping read and write data transfers

Overlapping of Data Transfer (in/out) Overlapping data transfer with functional-unit execution

Register Allocation Using Clique Partitioning Scheduled DFG Graph model Lifetime intervals of variable Clique-partitioning solution

Left-Edge Algorithm Register allocation using Left-Edge Algorithm

Register Allocation: Left-Edge Algorithm Sorted variable lifetime intervalsFive-register allocation result

Register Allocation Allocation : bind registers and functional modules to variables and operations in the CDFG and specify the interconnection among modules and registers in terms of MUX or BUS. Reduce capacitance during allocation by minimizing the number of functional modules, registers, and multiplexers. Register allocation and variable assignment can have a profound impact on spurious switching activity in a circuit Judicious variable assignment is a key factor in the elimination of spurious operations

Effect of Register Sharing

Two Variable Assignments

Spurious Computing

Minimizing Spurious Computing

Effect of Register Sharing on FU For a small increase in the number of registers, it is possible to significantly reduce or eliminate spurious operations For Design 1, synthesized from Assignment 1, the power consumption was 30.71mW For Design 2, synthesized from Assignment 2, the power consumption was  m standard-cell library

Operand sharing To schedule and bind operations to functional units in such a way that the activity of the input operands is reduced. Operations sharing the same operand are bound to the same functional unit and scheduled in such a way that the function unit can reuse that operand. we will call operand reutilization (OPR) the fact that an operand is reused by two operations consecutively executed in the same functional unit.

Two scheduling and bindings

Power Saving by OPR

Results obtained by applying the operand-sharing technique.

Operand Retaining An idle functional unit may have input operand changes because of the variation of the selection signals of multiplexers. The operand-retaining technique attempts to minimize the useless power consumption of the idle functional units.

Spurious 연산을 최소화 f

Minimize the useless power consumption of idle units (a) with a proper register binding that minimizes the activity of the functional units (b) by wisely defining the control signals of the multiplexors during the idle cycles in such a way that the changes at the inputs of the functional units are minimized (this may result in defining some of the don't care values of the control signals) [RJ94] (c) latching the operands of those units that will be often idle.

(C) latching the operands It consists of the insertion of latches at the inputs of the functional units to store the operands only when the unit requires them. Thus, in those cycles in which the unit is idle no consumption in produced. The control unit has to be redesigned accordingly, in such a way that input latches become transparent during those cycles in which the corresponding functional unit must execute an operation. Similar to putting the functional units to sleep when they are not needed through gated-clocking strategies [CSB94,BSdM94, AMD94].

LMS filter Example The power consumption generated by the idle units (useless consumption) is : the power consumption because of the useful calculations (useful consumption) is: The estimated reduction in power consumption is: reduction of 34%.