1 Post RTL structures/flows targeting low power Srinivas R Jammula Intel Corporation Bangalore, India Naveen M Kumar Intel Corporation Bangalore, India.

Slides:



Advertisements
Similar presentations
Chuck Alpert Design Productivity Group Austin Research Laboratory
Advertisements

Copyright © 2009,Intel Corporation. All rights reserved. Auto ECO Flow Development For Functional ECO Using Efficient Error Rectification Method Based.
TOPIC : SYNTHESIS DESIGN FLOW Module 4.3 Verilog Synthesis.
Gregory Shklover, Ben Emanuel Intel Corporation MATAM, Haifa 31015, Israel Simultaneous Clock and Data Gate Sizing Algorithm with Common Global Objective.
Dan Lander Haru Yamamoto Shane Erickson (EE 201A Spring 2004)
OCV-Aware Top-Level Clock Tree Optimization
Final Project : Pipelined Microprocessor Joseph Kim.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Systematic method for capturing “design intent” of Clock Domain Crossing (CDC) logic in constraints Ramesh Rajagopalan Cisco Systems.
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Module 12.  In Module 9, 10, 11, you have been introduced to examples of combinational logic circuits whereby the outputs are entirely dependent on the.
Digital Logic Design Brief introduction to Sequential Circuits and Latches.
Minimum Implant Area-Aware Gate Sizing and Placement
DAC IP Track Submission CDC aware power reduction for Soft IPs Ritesh Agarwal (Freescale™) Amit Goldie (Atrenta) Freescale Semiconductor Confidential.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
RTL Hardware Design by P. Chu Chapter 161 Clock and Synchronization.
ECE Synthesis & Verification - Lecture 8 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Introduction.
Power-Aware Placement
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
1 Chapter 9 Design Constraints and Optimization. 2 Overview Constraints are used to influence Synthesizer tool Place-and-route tool The four primary types.
Reconfigurable Computing (EN2911X, Fall07)
Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,
Layout-based Logic Decomposition for Timing Optimization Yun-Yin Lien* Youn-Long Lin Department of Computer Science, National Tsing Hua University, Hsin-Chu,
ELEN468 Lecture 11 ELEN468 Advanced Logic Design Lecture 1Introduction.
Combining High Level Synthesis and Floorplan Together EDA Lab, Tsinghua University Jinian Bian.
6.375 Complex Digital Systems Krste Asanovic March 7, 2007
Global Timing Constraints FPGA Design Workshop. Objectives  Apply timing constraints to a simple synchronous design  Specify global timing constraints.
Design methodology.
Power Reduction for FPGA using Multiple Vdd/Vth
ECO Methodology for Very High Frequency Microprocessor Sumit Goswami, Srivatsa Srinath, Anoop V, Ravi Sekhar Intel Technology, Bangalore, India Introduction.
CAD for Physical Design of VLSI Circuits
ASIC Design Flow – An Overview Ing. Pullini Antonio
An ASIC Design methodology with Predictably Low Leakage, using Leakage-immune Standard Cells Nikhil Jayakumar, Sunil P Khatri ISLPED’03.
Sub-expression elimination Logic expressions: –Performed by logic optimization. –Kernel-based methods. Arithmetic expressions: –Search isomorphic patterns.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
CMOS Design Methods.
An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
05/04/06 1 Integrating Logic Synthesis, Tech mapping and Retiming Presented by Atchuthan Perinkulam Based on the above paper by A. Mishchenko et al, UCAL.
FORMAL VERIFICATION OF ADVANCED SYNTHESIS OPTIMIZATIONS Anant Kumar Jain Pradish Mathews Mike Mahar.
Shantanu Dutt ECE Dept. UIC
Vendor Independent SEE Mitigation Solution For FPGAs Kamesh Ramani Pravin Bhandakkar Darren Zacher Melanie Berg (MEI – NASA Goddard)
ASIC, Customer-Owned Tooling, and Processor Design Nancy Nettleton Manager, VLSI ASIC Device Engineering April 2000 Design Style Myths That Lead EDA Astray.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
CHAPTER 8 Developing Hard Macros The topics are: Overview Hard macro design issues Hard macro design process Physical design for hard macros Block integration.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
1 Lecture 3: Modeling Sequential Logic in Verilog HDL.
University of Michigan Advanced Computer Architecture Lab. 2 CAD Tools for Variation Tolerance David Blaauw and Kaviraj Chopra University of Michigan.
Introduction to ASICs ASIC - Application Specific Integrated Circuit
Improved Flop Tray-Based Design Implementation for Power Reduction
Gopakumar.G Hardware Design Group
Class Exercise 1B.
ASIC Design Methodology
Chapter 7 – Specialized Routing
Flip Flops.
Design Methodology for Semi Custom Processor Cores
Revisiting and Bounding the Benefit From 3D Integration
Princess Sumaya University
Two-phase Latch based design
Fabio Garzia / HIgh Speed Logic, Circuits, Libraries and Layout
ECE 551: Digital System Design & Synthesis
ECE 545 Lecture 8 Timing Analysis.
Topics Circuit design for FPGAs: Logic elements. Interconnect.
ECE 699: Lecture 3 ZYNQ Design Flow.
Low Power Digital Design
Measuring the Gap between FPGAs and ASICs
A Random Access Scan Architecture to Reduce Hardware Overhead
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

1 Post RTL structures/flows targeting low power Srinivas R Jammula Intel Corporation Bangalore, India Naveen M Kumar Intel Corporation Bangalore, India Ambar Mukherji Intel Corporation Bangalore, India

DAC Motivation Power reduction is important for both high performance and battery life scenarios Power reduction is important for both high performance and battery life scenarios Traditionally, optimization steps in the design flow prioritize Timing than Power Traditionally, optimization steps in the design flow prioritize Timing than Power

DAC Scope of work  Our primary focus will be on synthesis to Tape-In  Addresses by choosing power friendly design structures  Complementing with new power friendly flows Our focus area

DAC Tech1 :: Latch movement in memory Moved latches from o/p to i/p of decoder Moved latches from o/p to i/p of decoder –2 n latches reduced to n latches Logic Synthesis Floorplan Placement Clock Tree Routing Post Route Opt Layout Finishing Before swap After swap Genram latch movement [RTL]

DAC Tech2 :: Sequential Cluster/multi-bit Logic Synthesis Floorplan Placement Clock Tree merge Clock Tree split Routing Post Route Opt Layout Finishing Seq clustering  flops pulled together – reduces clock routing  Single flops intercepted as Dual/Quad flops – clocks shared

DAC Tech3 :: Clocks L1/L2 swap & Low Vt Power friendly structure by swapping the clock-AND gate Power friendly structure by swapping the clock-AND gate Clock-tree with low-Vt cells instead of high-Vt Clock-tree with low-Vt cells instead of high-Vt Logic Synthesis Floorplan Placement Clock Tree merge Clock Tree split Routing Post Route Opt Layout Finishing Clock L1/L2 Swap Clk AND L2 CTS – After Swap Clk Buffer Clk Source L2 L1 CTS – Before Swap Flops L1 Clk And Clk Buffer Low Power Medium Power High Power

DAC Results Power reduction achieved in the entire design implementation phase Power reduction achieved in the entire design implementation phase –Enhanced performance per watt significantly Final quantification of all 3 techniques are tabulated below Final quantification of all 3 techniques are tabulated below Feature Clock Cdyn savings Design leakage Savings Timing impact technique 14%+0.25%Negligible technique 25%Negligible technique 315%-3%Negligible

DAC Summary/Next Steps Lot of scope to improve the current EDA tools to optimize for low power Lot of scope to improve the current EDA tools to optimize for low power Can these optimizations parameters become part of the cost function of the tool suite? Can these optimizations parameters become part of the cost function of the tool suite? –To get more global optimal solution There is scope for micro-architectural improvements There is scope for micro-architectural improvements –For ex: Clustering was effective due to native data flow –Improve the data path partitioning