LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

Slides:



Advertisements
Similar presentations
Field Programmable Gate Array
Advertisements

FPGA (Field Programmable Gate Array)
Verilog Fundamentals Shubham Singh Junior Undergrad. Electrical Engineering.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 5 Programmable.
BSPlace: A BLE Swapping technique for placement Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 1.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
02/02/20091 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
Evolution of implementation technologies
An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction Yan Lin 1, Yu Hu 1, Lei He 1 and Vijay Raghunathan 2 1 EE Department,
1/31/20081 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Yan Lin and Lei He EE Department, UCLA Partially supported.
FPGA Technology Mapping. 2 Technology mapping:  Implements the optimized nodes of the Boolean network to the target device library.  For FPGA, library.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João Cardoso, Dirk Stroobandt.
Power Reduction for FPGA using Multiple Vdd/Vth
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
System Arch 2008 (Fire Tom Wada) /10/9 Field Programmable Gate Array.
A NEW ECO TECHNOLOGY FOR FUNCTIONAL CHANGES AND REMOVING TIMING VIOLATIONS Jui-Hung Hung, Yao-Kai Yeh,Yung-Sheng Tseng and Tsai-Ming Hsieh Dept. of Information.
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
Implementation of Finite Field Inversion
Modern VLSI Design 3e: Chapter 3Partly from 2002 Prentice Hall PTR week9-1 Lectures 21, 22 FPGA and Top-Down Design Flow Mar. 3 and 5, 2003.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Programmable Logic Devices
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
1 Power estimation in the algorithmic and register-transfer level September 25, 2006 Chong-Min Kyung.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
An Improved “Soft” eFPGA Design and Implementation Strategy
CDA 4253 FPGA System Design RTL Design Methodology 1 Hao Zheng Comp Sci & Eng USF.
Static Timing Analysis
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.
Architecture and algorithm for synthesizable embedded programmable logic core Noha Kafafi, Kimberly Bozman, Steven J. E. Wilton 2003 Field programmable.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
Resource Sharing in LegUp. Resource Sharing in High Level Synthesis Resource Sharing is a well-known technique in HLS to reduce circuit area by sharing.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Sequential Programmable Devices
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
Introduction to Programmable Logic
Field Programmable Gate Array
Field Programmable Gate Array
Field Programmable Gate Array
Multiple Drain Transistor-Based FPGA Architectures
The Xilinx Virtex Series FPGA
Lesson 4 Synchronous Design Architectures: Data Path and High-level Synthesis (part two) Sept EE37E Adv. Digital Electronics.
The Xilinx Virtex Series FPGA
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
CprE / ComS 583 Reconfigurable Computing
Chapter 3b Leakage Efficient Chip-Level Dual-Vdd Assignment with Time Slack Allocation for FPGA Power Reduction Prof. Lei He Electrical Engineering Department.
Presentation transcript:

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst 1

Overview Motivation Introduction FPGA Architecture LOPASS Synthesis Flow High level Power Estimation Power Optimization Engine Multiplexer Optimization for Interconnect Reduction Experimental Results Conclusion 2

Motivation Power consumption Critical constraining factor in IC design flow Field Programmable Gate Arrays(FPGA) Power inefficient due to large amount of transistors for programmability Fixed Logic and Routing Resources Difficult to optimize during physical design stage 3

Introduction Behavioral Level Optimization scheduling, allocation, binding Techniques for power reduction high level power estimation simultaneous scheduling allocation and binding for power optimization interconnection optimization 4

Previous Work Most previous high level synthesis techniques for FPGAs optimized objectives other than power reduction Dynamic reconfiguration during run time to save area, [ M. Vasilko, Int.Workshop Logic Architecture Synthesis,1995 ] Tradeoff between power and circuit speed by selecting different implementations of components Power consumption in steering logic and interconnects were not considered. [ F. G. Wolff, Proc IEEE Nat.Aerospace.Conf.,2000 ] Newer studies have looked into simultaneous resource allocation and binding algorithms for power reduction [ D. Chen, Proc. AsiaSouth Pacific Des. Autom. Conf., Jan ] 5

Techniques for Power Reduction High level power estimation For effective power optimization wire capacitance, length, FPGA characteristics Power Optimization engine combined solution space Simulated Annealing based algorithm Interconnect Optimization Reduce Multiplexer(MUX) requirement 6

FPGA Architecture SRAM based technology Configurable Logic Block (CLB) Basic Logic Element (BLE) Look Up Table (LUT) Routing Architecture parameters Channel Width (W) Switch box flexibility (F s ) Connection box flexibility (F c ) 7

LOPASS Synthesis Flow Design in HDL converted to CDFG Estimated power values from power estimator Power optimization by low power optimization engine RTL synthesis using Design Compiler FPGA evaluation tool fpgEva_LP2 report delay, power and area. 8

High Level power Estimation Wire Length Estimation Rent’s Rule T = kN p Interconnect density function i(l) p is Rent’s exponent, α is fraction of sink terminals f.o is average fan-out, k is average input/output per CLB 9

High Level power Estimation cont. Switching Activity Estimation CDFG simulation C in (O,O’), input transitions when FU switches from O to O’ The switching activity Sin is given by The total switching activity of the overall design 10

High Level power Estimation cont. Resource library Characterization Design ware libraries from Synopsys different resource versions for implementing same operation type Resource characterization flow 11

High Level power estimator Static and Dynamic power need to considered Dynamic power is given by P dynamic = P LUT + P REG +P LW +P GW Static power is given by P static = P s_LUT + P s_FF + P s_LB + P s_GB P LUT = N LUT.S.E LUT.f P REG = N REG.S.E REG.f P LW, G LW = 0.5f.S.V dd 2.C wire 12

Power Optimization Engine FPGAs have abundance of distributed registers No efficient support for wide MUXes Uses simulated annealing based on hill climbing to gradually reduce overall power Power Optimization engine 13

Multiplexer Optimization for Interconnect Reduction Register binding Cofamily based algorithm Port assignment Port Assignment Algorithm Definitions DFG, G =(V,A) Compatibility Graph G c = (V c,A c ) 14

Register Binding Given a compatibility graph G c = (V c,A c ) find a subset of A c that covers all vertices in V c total sum of weights of all edges is minimum Calculate minimum weighted cofamilies of a partially ordered set (POSET) POSET chain, antichain, k-family, k-cofamily Theorem: Register binding on a compatibility graph G c into k registers is equivalent to finding k disjoint chains in the POSET. 15

Register Binding cont. Find the minimum weighted k-cofamily in POSET Convert POSET to a network flow graph, the split graph Find the minimum cost flow for this split graph Cost of each edge is given by 16

Cost Function Formulation A MUX occurs in two situations when more than two registers feed data to a port when more than two FUs produce results and store them into a register The cost function is defined as N mux = number of MUXes saved/wasted T r-f = total connections between registers and fan out FUs T fu = total fanout FUs involved α and β are positive scaling constants 17

Port Assignment Technique for reducing MUX connection Case 1 Case 2 18

Experimental Results Power Estimation Comparison between estimated power and those reported by fpgaEva_LP2 Wire length is 13.7% away from reality Total power is 14.1% away from reality Multiplexer Optimization Comparison between k-co family algorithm and Bipartite algorithm and Left edge algorithm 24.7 % better than Bipartite algorithm 29.6% better than Left edge algorithm 19

Experimental Results LOPASS Compared to SPARK 9.1 % better in terms of latency optimization LOPASS Compared to Synopsys Behavioral Compiler 57.3% reduction in CLBs 61.6% reduction in total power consumption 10.6% reduction in critical delay LOPASS Compared to Impulse C On average 77.1% reduction in multipliers and 27.9% in LEs 44.1% and 31.1% reduction in dynamic and total power 20

Conclusion A Low power architectural synthesis system, LOPASS for FPGA designs is presented It includes three major components a flexible high level power estimator a simulated annealing based optimization engine a k-co family based register binding algorithm LOPASS is 61.6% better on power consumption and 10.6% better on clock period compared to Synopsis BC LOPASS is 31.1% better on power consumption with 11.8% penalty on clock period compared to Impulse C 21

Thank You! 22