Nathaniel McVicar Corey Olson Jimmy Xu

Slides:



Advertisements
Similar presentations
Sequential Logic Design
Advertisements

Break Time Remaining 10:00.
The basics for simulations
Processor Data Path and Control Diana Palsetia UPenn
Digital Systems Verification Lecture 13 Alessandra Nardi.
Chungki Oh, Jianfeng Liu, Seokhoon Kim, Kyung-Tae Do,
Feb. 17, 2011 Midterm overview Real life examples of built chips
1 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling.
ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Clock will move after 1 minute
Select a time to count down from the clock above
Spartan-3 FPGA HDL Coding Techniques
ALU Organization Michael Vong Louis Young Rongli Zhu Dan.
ECE 551 Digital System Design & Synthesis Lecture 08 The Synthesis Process Constraints and Design Rules High-Level Synthesis Options.
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
Altera FLEX 10K technology in Real Time Application.
Ch.3 Overview of Standard Cell Design
Synchronous Digital Design Methodology and Guidelines
Assume array size is 256 (mult: 4ns, add: 2ns)
1 Simple FPGA David, Ronald and Sudha Advisor: Dave Parent 12/05/2005.
Huffman Encoder Project. Howd - Zur Hung Eric Lai Wei Jie Lee Yu - Chiang Lee Design Manager: Jonathan P. Lee Huffman Encoder Project Final Presentation.
1 Design of 4- BIT ALU Swetha Challawar Anupama Bhat Leena Kulkarni Satya Kattamuri Advisor: Dr.David Parent 05/11/2005.
Joe Gebis Computer Science Division University of California, Berkeley IRAM CAD Status and Plan.
IMPLEMENTATION OF µ - PROCESSOR DATA PATH
9 th Sept, VLSI Design & Test seminar series, Fall 2009, Auburn University, Auburn, AL Low Power Implementation of ARM1176JZF-S by Manish Kulkarni.
Lec 17 Nov 2 Chapter 4 – CPU design data path design control logic design single-cycle CPU performance limitations of single cycle CPU multi-cycle CPU.
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
Programmable logic and FPGA
Logic Design Outline –Logic Design –Schematic Capture –Logic Simulation –Logic Synthesis –Technology Mapping –Logic Verification Goal –Understand logic.
An Introduction to Synopsys Design Automation Jeremy Lee November 7, 2007.
King Fahd University of Petroleum and Minerals Computer Engineering Department COE 561 Digital Systems Design and Synthesis (Course Activity) Synthesis.
1 DESIGN OF 8-BIT ALU Vijigish Lella Harish Gogineni Bangar Raju Singaraju Advisor: Dr. David W. Parent 8 May 2006.
1 3/22/02 Benchmark Update u Carnegie Cell Library: “Free to all who Enter” s Need to build scaling model of standard cell library s Based on our open.
1 8 Bit ALU EE 166 Design Project San Jose State University Roger Flores Brian Silva Chris Tran Harizo Yawary Advisor: Dr. Parent May 2006.
Global Timing Constraints FPGA Design Workshop. Objectives  Apply timing constraints to a simple synchronous design  Specify global timing constraints.
Design methodology.
TM Efficient IP Design flow for Low-Power High-Level Synthesis Quick & Accurate Power Analysis and Optimization Flow JAN Asher Berkovitz Yaniv.
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Ronny Krashinsky Seongmoo Heo Michael Zhang Krste Asanovic MIT Laboratory for Computer Science SyCHOSys Synchronous.
Power Reduction for FPGA using Multiple Vdd/Vth
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
ASIC Design Flow – An Overview Ing. Pullini Antonio
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
EEE2243 Digital System Design Chapter 7: Advanced Design Considerations by Muhazam Mustapha, extracted from Intel Training Slides, April 2012.
By Praveen Venkataramani
Timing and Constraints “The software is the lens through which the user views the FPGA.” -Bill Carter.
Dec 1, 2003 Slide 1 Copyright, © Zenasis Technologies, Inc. Flex-Cell Optimization A Paradigm Shift in High-Performance Cell-Based Design A.
IMPLEMENTATION OF MIPS 64 WITH VERILOG HARDWARE DESIGN LANGUAGE BY PRAMOD MENON CET520 S’03.
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
Design of 4-bit ALU.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
CPU Overview Computer Organization II 1 February 2009 © McQuain & Ribbens Introduction CPU performance factors – Instruction count n Determined.
Written by Whitney J. Wadlow
Low Power, High-Throughput AD Converters
-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.
1 EE 382M VLSI 1 EE 360R Computer-Aided Integrated Circuit Design Lab 1 Demo Fall 2011 Whitney J. Wadlow.
VLSI 1 382M/460R Lab 2 DESIGN OF AN ARITHMETIC LOGIC UNIT (ALU)
ASIC Design Methodology
SoCKs Flow: Here, There, and Back Again
Written by Whitney J. Wadlow
Morgan Kaufmann Publishers
CGRA Express: Accelerating Execution using Dynamic Operation Fusion
Week 5, Verilog & Full Adder
Timing Analysis 11/21/2018.
Design of an Arithmetic Logic Unit (ALU)
A High Performance SoC: PkunityTM
EE382M VLSI 1 LAB 1 DEMO FALL 2018.
Measuring the Gap between FPGAs and ASICs
Presentation transcript:

Nathaniel McVicar Corey Olson Jimmy Xu Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

Outline Functional Unit Design Flow (all modules) UPF Tutorial Results Shifter ALU MADD Design Flow (all modules) VCS Design Compiler PrimeTime Encounter & Cadence v2lvs UPF Tutorial Results Dynamic Power consumption of modules Power Down/Up timing VDD Scaling

FU TopLevel Main Units Supporting Modules ALU MADD Barrel Shifter Output Muxes Clock gating registers Crossbar

IBM 65nm PDK Process - cmos10lpe Standard cells low power process very low leakage in power analysis Standard cells cp65npksdst_tt1p2v25c

Shifter Specs 32-bit shifter with 5 shift bits 1GHz target frequency Bi-directional shifting Logical and arithmetic shifting Purely combinational design 1GHz target frequency Want it as fast as possible Need to be power aware during synthesis

Shifter Design X[30:0] 31{X[31]} 31’b0 X[31:0] LEFT / LOGICAL S[4] Z

ALU Specs 32-bit ALU supporting 1GHz target frequency Supports 15 instructions Combinational design 1GHz target frequency On critical path Want it as fast as possible Need to be power aware during synthesis

ALU Design Methodologies Muxed Output Simple functions with muxed output Gate off functions not in use More gates Higher leakage, lower switching Hardware Reuse Do everything with the adder Cannot gate the adder Fewer gates Lower leakage, higher switching

ALU Design 1 A P A Z + B B G Z A B setA clearA flipA flipB clearB setA Control B sel[1:0]

Power Results Switching: (Syn. Model) Interconnect: Leakage: 630 uW (3.55 uW) Interconnect: 1.14 mW (3.94 mW) Leakage: 135 nW (530 nW) Total: 1.77 mW (7.5 mW)

ALU Design 2 A latch A O + B B en Z en en sel[1:0] Control en

Power Results Switching: (Syn. Model) Interconnect: Leakage: 655 uW (3.55 uW) Interconnect: 1.21 mW (3.94 mW) Leakage: 160 nW (530 nW) Total: 1.87 mW (7.5 mW)

MADD Specs 32 bit multiply-add unit 1 GHz target frequency 2 cycle pipelined module Add input arrives on second cycle 1 GHz target frequency most power hungry module in design need to be power aware during synthesis ideally would run as fast as possible may need to trade speed for power (~700MHz)

MADD Design A B CLK C Z Heterogeneous Booth Enc PP Generation CSA Tree Stage 1 D Q Registers CLK CLK CSA Tree Stage 2 Final Adder C Z

VCS Testbenches written to verify functionality using VCS random input vectors used for data instructions/shift encodings tested sequentially

Design Compiler Compile to standard cell library Reports created cp65npksdst_tt1p2v25c from IBM’s cmos10lpe compile to others for corner analysis (ff, 1p0v,…) control target frequency and synthesize for power Reports created Power – inaccurate, but use as a baseline Area – reports number of gates in design Timing – design can’t always meet timing

DC Example # standard cells that you synthesize to set target_library <libname>.db set link_library <libname>.db # prepare and synthesize analyze –f verilog <my_verilog_file>.v elaborate <my_toplevel> current_design <my_toplevel> link uniquify compile_ultra –gate_clock compile_ultra –incremental # check for errors in the synthesized design (timing violations, cell warnings,…) check_design report_constraint –all_violators # write the output file in verilog netlist format write –f verilog –output <filename>.vh # output the timing or power or cell report redirect timing/power/cell.rep { report_timing/cell/power }

DC Example Output Operating Conditions: TT1P2V25C Library: cp65npksdst_tt1p2v25c Wire Load Model Mode: enclosed Design Wire Load Model Library ------------------------------------------------ Alu B0.1X0.1 cp65npksdst_tt1p2v25c Global Operating Voltage = 1.2 Power-specific unit information : Voltage Units = 1V Capacitance Units = 1.000000pf Time Units = 1ns Dynamic Power Units = 1mW (derived from V,C,T units) Leakage Power Units = 1nW Cell Internal Power = 433.2152 uW (51%) Net Switching Power = 409.2202 uW (49%) --------- Total Dynamic Power = 842.4354 uW (100%) Cell Leakage Power = 129.3405 nW

PrimeTime power analysis timing check - redundant at this stage reports breakdown of power consumption internal switching intermediate nodes switching leakage more detailed breakdown available memory, clock network, register, combinational timing check - redundant at this stage no functional verification use simulator for functionality vcs, ncsim

PT Example # setup link_library <libname>.db read_verilog <netlist>.vh current_design <my_toplevel> link # for a design without an existing clock input create_clock –name clock -period # toggle_count is prob of switching, static is prob of being a 1 set_switching_activity –toggle_count 0.25 –static_probability 0.5 <INPUT> # get the power analysis and write details to Alu.rpt check_power update_power report_power > Alu.rpt

PT Example Output Attributes ---------- i - Including register clock pin internal power u - User defined power group Internal Switching Leakage Total Power Group Power Power Power Power ( %) Attrs -------------------------------------------------------------------------------------------- io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%) memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%) black_box 0.0000 0.0000 0.0000 0.0000 ( 0.00%) clock_network 0.0000 0.0000 0.0000 0.0000 ( 0.00%) i register 0.0000 0.0000 0.0000 0.0000 ( 0.00%) combinational 9.606e-04 1.053e-03 1.295e-07 2.014e-03 (100.00%) sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%) Net Switching Power = 1.053e-03 (52.30%) Cell Internal Power = 9.606e-04 (47.70%) Cell Leakage Power = 1.295e-07 ( 0.01%) --------- Total Power = 2.014e-03 (100.00%)

Encounter Features Place and Route Control the power and ground to all cells Extract parasitic capacitances stream out gds for use with Cadence

ALU Encounter Example

Encounter Failures difficult to use impossible to save netlist views still need to use cadence tools to generate SPICE netlist unable to extract parasitics could still do this with Cadence

Cadence Features read in a verilog netlist stream in standard cell layouts and schematics stream in gds from Encounter create SPICE netlist

ShiftLR Cadence Example

Cadence Failures Solution unable to properly stream in standard cell schematics unable to create netlist from schematic unable to run LVS or extract parasitics Solution v2lvs

v2lvs enables a SPICE netlist from a synthesized verilog netlist include SPICE definitions of standard cells run HSPICE simulations for power down/up sequence and VDD scaling

v2lvs Example Verilog: SEN_EO2_S_0P5 U2120 ( .A1(pprow4[11]), .A2(pprow5[9]), .X(n566) ); SEN_EO2_S_0P5 U2121 ( .A1(pprow4[13]), .A2(pprow5[11]), .X(n567) ); SEN_EO2_S_0P5 U2122 ( .A1(pprow2[13]), .A2(pprow7[3]), .X(n568) ); SEN_EO2_S_0P5 U2123 ( .A1(pprow2[15]), .A2(pprow7[5]), .X(n569) ); v2lvs: v2lvs -i -v ../synthesis/ShiftLR.vh -s0 VSS -s1 VDD -s design_model.inc -o ShiftLR.sp -lsr cp65npksdst.lvs HSPICE: XU2120 n566 pprow4[11] pprow5[9] SEN_EO2_S_0P5 XU2121 n567 pprow4[13] pprow5[11] SEN_EO2_S_0P5 XU2122 n568 pprow2[13] pprow7[3] SEN_EO2_S_0P5 XU2123 n569 pprow2[15] pprow7[5] SEN_EO2_S_0P5

HSPICE Created simulation test-bench for power measurement using vector input Adds potential VDD scaling and gating

Final Power Results

Synthesis Matters At 1 GHz, MADD power very dependent on synthesis options Internal Switching Leakage Total Naïve 11.2 mW 7.16 mW 1.07 uW 18.3 mW Constrained 7.77 mW 4.56 mW 0.59 uW 12.3 mW Ultra 4.08 mW 1.88 mW 0.30 uW 5.96 mW

Synthesis Matter contd. The lower power synthesis options, have trouble reducing clock and register power Clock Register Comb Naïve 9.95% 13.0% 77.05% Constrained 12.7% 14.8% 72.5% Ultra 27.4% 12.9% 58.5%

Power-up time results W=0.6um M=1

Power-up time results contd. W=0.6um M=12

Power-up time results contd. W=6um M=12

Power-up time results contd. W=6um M=120

Power-up time results contd. Iavg during power-down = 10.66 uA Pavg = 12.792 uW Power-up Delay = 9.4ps

Voltage Scaling - ALU

Voltage Scaling – ShiftLR

Results Significantly reduced power for all modules Explored voltage scaling Implemented power-up / power-down sleep logic

Intangibles Gained significant insight into the current state-of-the-art for low power FPGA and CGRA design, through reading Gained practical knowledge working with the design tool chain of a commercial PDK

Questions?