Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nathaniel McVicar Corey Olson Jimmy Xu

Similar presentations


Presentation on theme: "Nathaniel McVicar Corey Olson Jimmy Xu"— Presentation transcript:

1 Nathaniel McVicar Corey Olson Jimmy Xu
Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

2 Outline Functional Unit Design Flow (all modules) UPF Tutorial Results
Shifter ALU MADD Design Flow (all modules) VCS Design Compiler PrimeTime Encounter & Cadence v2lvs UPF Tutorial Results Dynamic Power consumption of modules Power Down/Up timing VDD Scaling

3 FU TopLevel Main Units Supporting Modules ALU MADD Barrel Shifter
Output Muxes Clock gating registers Crossbar

4 IBM 65nm PDK Process - cmos10lpe Standard cells low power process
very low leakage in power analysis Standard cells cp65npksdst_tt1p2v25c

5 Shifter Specs 32-bit shifter with 5 shift bits 1GHz target frequency
Bi-directional shifting Logical and arithmetic shifting Purely combinational design 1GHz target frequency Want it as fast as possible Need to be power aware during synthesis

6 Shifter Design X[30:0] 31{X[31]} 31’b0 X[31:0] LEFT / LOGICAL S[4]
Z

7 ALU Specs 32-bit ALU supporting 1GHz target frequency
Supports 15 instructions Combinational design 1GHz target frequency On critical path Want it as fast as possible Need to be power aware during synthesis

8 ALU Design Methodologies
Muxed Output Simple functions with muxed output Gate off functions not in use More gates Higher leakage, lower switching Hardware Reuse Do everything with the adder Cannot gate the adder Fewer gates Lower leakage, higher switching

9 ALU Design 1 A P A Z + B B G Z A B setA clearA flipA flipB clearB setA
Control B sel[1:0]

10 Power Results Switching: (Syn. Model) Interconnect: Leakage:
630 uW (3.55 uW) Interconnect: 1.14 mW (3.94 mW) Leakage: 135 nW (530 nW) Total: 1.77 mW (7.5 mW)

11 ALU Design 2 A latch A O + B B en Z en en sel[1:0] Control en

12 Power Results Switching: (Syn. Model) Interconnect: Leakage:
655 uW (3.55 uW) Interconnect: 1.21 mW (3.94 mW) Leakage: 160 nW (530 nW) Total: 1.87 mW (7.5 mW)

13 MADD Specs 32 bit multiply-add unit 1 GHz target frequency
2 cycle pipelined module Add input arrives on second cycle 1 GHz target frequency most power hungry module in design need to be power aware during synthesis ideally would run as fast as possible may need to trade speed for power (~700MHz)

14 MADD Design A B CLK C Z Heterogeneous Booth Enc PP Generation CSA Tree
Stage 1 D Q Registers CLK CLK CSA Tree Stage 2 Final Adder C Z

15 VCS Testbenches written to verify functionality using VCS
random input vectors used for data instructions/shift encodings tested sequentially

16 Design Compiler Compile to standard cell library Reports created
cp65npksdst_tt1p2v25c from IBM’s cmos10lpe compile to others for corner analysis (ff, 1p0v,…) control target frequency and synthesize for power Reports created Power – inaccurate, but use as a baseline Area – reports number of gates in design Timing – design can’t always meet timing

17 DC Example # standard cells that you synthesize to
set target_library <libname>.db set link_library <libname>.db # prepare and synthesize analyze –f verilog <my_verilog_file>.v elaborate <my_toplevel> current_design <my_toplevel> link uniquify compile_ultra –gate_clock compile_ultra –incremental # check for errors in the synthesized design (timing violations, cell warnings,…) check_design report_constraint –all_violators # write the output file in verilog netlist format write –f verilog –output <filename>.vh # output the timing or power or cell report redirect timing/power/cell.rep { report_timing/cell/power }

18 DC Example Output Operating Conditions: TT1P2V25C Library: cp65npksdst_tt1p2v25c Wire Load Model Mode: enclosed Design Wire Load Model Library Alu B0.1X cp65npksdst_tt1p2v25c Global Operating Voltage = 1.2 Power-specific unit information : Voltage Units = 1V Capacitance Units = pf Time Units = 1ns Dynamic Power Units = 1mW (derived from V,C,T units) Leakage Power Units = 1nW Cell Internal Power = uW (51%) Net Switching Power = uW (49%) Total Dynamic Power = uW (100%) Cell Leakage Power = nW

19 PrimeTime power analysis timing check - redundant at this stage
reports breakdown of power consumption internal switching intermediate nodes switching leakage more detailed breakdown available memory, clock network, register, combinational timing check - redundant at this stage no functional verification use simulator for functionality vcs, ncsim

20 PT Example # setup link_library <libname>.db
read_verilog <netlist>.vh current_design <my_toplevel> link # for a design without an existing clock input create_clock –name clock -period # toggle_count is prob of switching, static is prob of being a 1 set_switching_activity –toggle_count 0.25 –static_probability 0.5 <INPUT> # get the power analysis and write details to Alu.rpt check_power update_power report_power > Alu.rpt

21 PT Example Output Attributes ----------
i - Including register clock pin internal power u - User defined power group Internal Switching Leakage Total Power Group Power Power Power Power ( %) Attrs io_pad ( 0.00%) memory ( 0.00%) black_box ( 0.00%) clock_network ( 0.00%) i register ( 0.00%) combinational e e e e-03 (100.00%) sequential ( 0.00%) Net Switching Power = 1.053e-03 (52.30%) Cell Internal Power = e-04 (47.70%) Cell Leakage Power = 1.295e-07 ( 0.01%) Total Power = 2.014e-03 (100.00%)

22 Encounter Features Place and Route
Control the power and ground to all cells Extract parasitic capacitances stream out gds for use with Cadence

23 ALU Encounter Example

24 Encounter Failures difficult to use impossible to save netlist views
still need to use cadence tools to generate SPICE netlist unable to extract parasitics could still do this with Cadence

25 Cadence Features read in a verilog netlist
stream in standard cell layouts and schematics stream in gds from Encounter create SPICE netlist

26 ShiftLR Cadence Example

27 Cadence Failures Solution
unable to properly stream in standard cell schematics unable to create netlist from schematic unable to run LVS or extract parasitics Solution v2lvs

28 v2lvs enables a SPICE netlist from a synthesized verilog netlist
include SPICE definitions of standard cells run HSPICE simulations for power down/up sequence and VDD scaling

29 v2lvs Example Verilog: SEN_EO2_S_0P5 U2120 ( .A1(pprow4[11]), .A2(pprow5[9]), .X(n566) ); SEN_EO2_S_0P5 U2121 ( .A1(pprow4[13]), .A2(pprow5[11]), .X(n567) ); SEN_EO2_S_0P5 U2122 ( .A1(pprow2[13]), .A2(pprow7[3]), .X(n568) ); SEN_EO2_S_0P5 U2123 ( .A1(pprow2[15]), .A2(pprow7[5]), .X(n569) ); v2lvs: v2lvs -i -v ../synthesis/ShiftLR.vh -s0 VSS -s1 VDD -s design_model.inc -o ShiftLR.sp -lsr cp65npksdst.lvs HSPICE: XU2120 n566 pprow4[11] pprow5[9] SEN_EO2_S_0P5 XU2121 n567 pprow4[13] pprow5[11] SEN_EO2_S_0P5 XU2122 n568 pprow2[13] pprow7[3] SEN_EO2_S_0P5 XU2123 n569 pprow2[15] pprow7[5] SEN_EO2_S_0P5

30 HSPICE Created simulation test-bench for power measurement using vector input Adds potential VDD scaling and gating

31 Final Power Results

32 Synthesis Matters At 1 GHz, MADD power very dependent on synthesis options Internal Switching Leakage Total Naïve 11.2 mW 7.16 mW 1.07 uW 18.3 mW Constrained 7.77 mW 4.56 mW 0.59 uW 12.3 mW Ultra 4.08 mW 1.88 mW 0.30 uW 5.96 mW

33 Synthesis Matter contd.
The lower power synthesis options, have trouble reducing clock and register power Clock Register Comb Naïve 9.95% 13.0% 77.05% Constrained 12.7% 14.8% 72.5% Ultra 27.4% 12.9% 58.5%

34 Power-up time results W=0.6um M=1

35 Power-up time results contd.
W=0.6um M=12

36 Power-up time results contd.
W=6um M=12

37 Power-up time results contd.
W=6um M=120

38 Power-up time results contd.
Iavg during power-down = uA Pavg = uW Power-up Delay = 9.4ps

39 Voltage Scaling - ALU

40 Voltage Scaling – ShiftLR

41 Results Significantly reduced power for all modules
Explored voltage scaling Implemented power-up / power-down sleep logic

42 Intangibles Gained significant insight into the current state-of-the-art for low power FPGA and CGRA design, through reading Gained practical knowledge working with the design tool chain of a commercial PDK

43 Questions?


Download ppt "Nathaniel McVicar Corey Olson Jimmy Xu"

Similar presentations


Ads by Google