Presentation is loading. Please wait.

Presentation is loading. Please wait.

Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu.

Similar presentations

Presentation on theme: "Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu."— Presentation transcript:

1 Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu

2 Outline Functional Unit Shifter ALU MADD Design Flow (all modules) VCS Design Compiler PrimeTime Encounter & Cadence v2lvs UPF Tutorial Results Dynamic Power consumption of modules Power Down/Up timing VDD Scaling

3 FU TopLevel Main Units ALU MADD Barrel Shifter Supporting Modules Output Muxes Clock gating registers Crossbar

4 IBM 65nm PDK Process - cmos10lpe low power process very low leakage in power analysis Standard cells cp65npksdst_tt1p2v25c

5 Shifter Specs 32-bit shifter with 5 shift bits Bi-directional shifting Logical and arithmetic shifting Purely combinational design 1GHz target frequency Want it as fast as possible Need to be power aware during synthesis

6 Shifter Design 31b0 X[31:0]X[30:0] 31{X[31]} LEFT / LOGICAL Z S[4] S[3] S[2] S[1] S[0]

7 ALU Specs 32-bit ALU supporting Supports 15 instructions Combinational design 1GHz target frequency On critical path Want it as fast as possible Need to be power aware during synthesis

8 ALU Design Methodologies Muxed Output Simple functions with muxed output Gate off functions not in use More gates Higher leakage, lower switching Hardware Reuse Do everything with the adder Cannot gate the adder Fewer gates Lower leakage, higher switching

9 ALU Design 1 + A B A B flipA flipB clearA clearB setA A B P G Z Z Control sel[1:0]

10 Power Results Switching:(Syn. Model) 630 uW (3.55 uW) Interconnect: 1.14 mW(3.94 mW) Leakage: 135 nW(530 nW) Total: 1.77 mW(7.5 mW)

11 ALU Design 2 + A B A B O Z Control sel[1:0] en latch en

12 Power Results Switching:(Syn. Model) 655 uW (3.55 uW) Interconnect: 1.21 mW(3.94 mW) Leakage: 160 nW(530 nW) Total: 1.87 mW(7.5 mW)

13 MADD Specs 32 bit multiply-add unit 2 cycle pipelined module Add input arrives on second cycle 1 GHz target frequency most power hungry module in design need to be power aware during synthesis ideally would run as fast as possible may need to trade speed for power (~700MHz)

14 MADD Design A B CLK Heterogeneous Booth Enc PP Generation CSA Tree Stage 1 D Q Registers CLK C CSA Tree Stage 2 Final Adder Z

15 VCS Testbenches written to verify functionality using VCS random input vectors used for data instructions/shift encodings tested sequentially

16 Design Compiler Compile to standard cell library cp65npksdst_tt1p2v25c from IBMs cmos10lpe compile to others for corner analysis (ff, 1p0v,…) control target frequency and synthesize for power Reports created Power – inaccurate, but use as a baseline Area – reports number of gates in design Timing – design cant always meet timing

17 DC Example # standard cells that you synthesize to set target_library.db set link_library.db # prepare and synthesize analyze –f verilog.v elaborate current_design link uniquify compile_ultra –gate_clock compile_ultra –incremental # check for errors in the synthesized design (timing violations, cell warnings,…) check_design report_constraint –all_violators # write the output file in verilog netlist format write –f verilog –output.vh # output the timing or power or cell report redirect timing/power/cell.rep { report_timing/cell/power }

18 DC Example Output Operating Conditions: TT1P2V25C Library: cp65npksdst_tt1p2v25c Wire Load Model Mode: enclosed Design Wire Load Model Library Alu B0.1X0.1 cp65npksdst_tt1p2v25c Global Operating Voltage = 1.2 Power-specific unit information : Voltage Units = 1V Capacitance Units = pf Time Units = 1ns Dynamic Power Units = 1mW (derived from V,C,T units) Leakage Power Units = 1nW Cell Internal Power = uW (51%) Net Switching Power = uW (49%) Total Dynamic Power = uW (100%) Cell Leakage Power = nW

19 PrimeTime power analysis reports breakdown of power consumption internal switching intermediate nodes switching leakage more detailed breakdown available memory, clock network, register, combinational timing check - redundant at this stage no functional verification use simulator for functionality vcs, ncsim

20 PT Example # setup link_library.db read_verilog.vh current_design link # for a design without an existing clock input create_clock –name clock -period # toggle_count is prob of switching, static is prob of being a 1 set_switching_activity –toggle_count 0.25 –static_probability 0.5 # get the power analysis and write details to Alu.rpt check_power update_power report_power > Alu.rpt

21 PT Example Output Attributes i - Including register clock pin internal power u - User defined power group Internal Switching Leakage Total Power Group Power Power Power Power ( %) Attrs io_pad ( 0.00%) memory ( 0.00%) black_box ( 0.00%) clock_network ( 0.00%) i register ( 0.00%) combinational 9.606e e e e-03 (100.00%) sequential ( 0.00%) Net Switching Power = 1.053e-03 (52.30%) Cell Internal Power = 9.606e-04 (47.70%) Cell Leakage Power = 1.295e-07 ( 0.01%) Total Power = 2.014e-03 (100.00%)

22 Encounter Features Place and Route Control the power and ground to all cells Extract parasitic capacitances stream out gds for use with Cadence

23 ALU Encounter Example

24 Encounter Failures difficult to use impossible to save netlist views still need to use cadence tools to generate SPICE netlist unable to extract parasitics could still do this with Cadence

25 Cadence Features read in a verilog netlist stream in standard cell layouts and schematics stream in gds from Encounter create SPICE netlist

26 ShiftLR Cadence Example

27 Cadence Failures unable to properly stream in standard cell schematics unable to create netlist from schematic unable to run LVS or extract parasitics Solution v2lvs

28 enables a SPICE netlist from a synthesized verilog netlist include SPICE definitions of standard cells run HSPICE simulations for power down/up sequence and VDD scaling

29 v2lvs Example Verilog: SEN_EO2_S_0P5 U2120 (.A1(pprow4[11]),.A2(pprow5[9]),.X(n566) ); SEN_EO2_S_0P5 U2121 (.A1(pprow4[13]),.A2(pprow5[11]),.X(n567) ); SEN_EO2_S_0P5 U2122 (.A1(pprow2[13]),.A2(pprow7[3]),.X(n568) ); SEN_EO2_S_0P5 U2123 (.A1(pprow2[15]),.A2(pprow7[5]),.X(n569) ); v2lvs: v2lvs -i -v../synthesis/ShiftLR.vh -s0 VSS -s1 VDD -s -o ShiftLR.sp -lsr cp65npksdst.lvs HSPICE: XU2120 n566 pprow4[11] pprow5[9] SEN_EO2_S_0P5 XU2121 n567 pprow4[13] pprow5[11] SEN_EO2_S_0P5 XU2122 n568 pprow2[13] pprow7[3] SEN_EO2_S_0P5 XU2123 n569 pprow2[15] pprow7[5] SEN_EO2_S_0P5

30 HSPICE Created simulation test-bench for power measurement using vector input Adds potential VDD scaling and gating

31 Final Power Results

32 Synthesis Matters At 1 GHz, MADD power very dependent on synthesis options InternalSwitchingLeakageTotal Naïve11.2 mW7.16 mW1.07 uW18.3 mW Constrained 7.77 mW4.56 mW0.59 uW12.3 mW Ultra4.08 mW1.88 mW0.30 uW5.96 mW

33 Synthesis Matter contd. The lower power synthesis options, have trouble reducing clock and register power ClockRegisterComb Naïve9.95%13.0%77.05% Constrained 12.7%14.8%72.5% Ultra27.4%12.9%58.5%

34 Power-up time results W=0.6um M=1

35 Power-up time results contd. W=0.6um M=12

36 Power-up time results contd. W=6um M=12

37 Power-up time results contd. W=6um M=120

38 Power-up time results contd. I avg during power-down = uA P avg = uW Power-up Delay = 9.4ps

39 Voltage Scaling - ALU

40 Voltage Scaling – ShiftLR 1.2V 1.0V 0.8V 0.6V

41 Results Significantly reduced power for all modules Explored voltage scaling Implemented power-up / power-down sleep logic

42 Intangibles Gained significant insight into the current state-of-the-art for low power FPGA and CGRA design, through reading Gained practical knowledge working with the design tool chain of a commercial PDK

43 Questions?

Download ppt "Low Power Functional Unit for use in Coarse Grained Reconfigurable Array Nathaniel McVicar Corey Olson Jimmy Xu."

Similar presentations

Ads by Google