Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs.

Similar presentations


Presentation on theme: "Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs."— Presentation transcript:

1 Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

2 Lecture 16: Power Reduction Techniques November 5, 2013 Overview FPGAs generally considered power hungry compared to ASIC and processor counterparts -Mostly due to unused interconnect Recent area of extensive research Device techniques -Voltage scaling -Sleep mode Software techniques -Reduced switching -Reduced capacitance

3 Lecture 16: Power Reduction Techniques November 5, 2013 Dynamic Power °Dynamic power is required to charge and discharge load capacitances when transistors switch. °One cycle involves a rising and falling output. °On rising output, charge Q = CV DD is required °On falling output, charge is dumped to GND Courtesy: Harris Short circuit current Charge/discharge current

4 Lecture 16: Power Reduction Techniques November 5, 2013 Dynamic Power Short circuit power <10% of dynamic power

5 Lecture 16: Power Reduction Techniques November 5, 2013 °Junction leakage °Gate oxide leakage °Subthreshold leakage FPGA Static Power Consumption

6 Lecture 16: Power Reduction Techniques November 5, 2013 °Junction leakage Small fraction of leakage °Gate oxide leakage When Vgs < Vt still some source-drain current Increases exponentially as Vt decreases Decreases exponentially as Vgs decreases °Subthreshold leakage Increases exponentially as Vgs increases FPGA Static Power Consumption Courtesy: Nowak Technology trend

7 Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Power Reduction Goals Dynamic power goals -Reduce Vdd along non-critical paths -Low swing signalling -Use CAD approaches to limit long high-toggle paths -P dynamic = 0.5 * C * Vdd 2 * f Static power goals -Cut-off Vdd for unused transistors -Use high Vt transistors for SRAM cells -Various other voltage biasing techniques

8 Lecture 16: Power Reduction Techniques November 5, 2013 Traditional Routing Switch level-restoring buffer Courtesy: Anderson

9 Lecture 16: Power Reduction Techniques November 5, 2013 Proposed Switch Designs: Anderson °Based on 3 observations: Routing switch inputs tolerant to weak-1 signals (level-restoring buffers). Considerable slack in FPGA designs  many switches can be slowed down. Most routing switches feed other routing switches. -Can produce weak-1 logic signals.

10 Lecture 16: Power Reduction Techniques November 5, 2013 “Basic” Switch Design high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: V VD

11 Lecture 16: Power Reduction Techniques November 5, 2013 High-Speed Mode high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: output swing: rail-to-rail. V VD = V DD

12 Lecture 16: Power Reduction Techniques November 5, 2013 Low-Power Mode high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: output swing: GND-to- (V DD -V TH ). V VD = V DD - V TH V VD output swing: GND-to- (V DD -V TH ).

13 Lecture 16: Power Reduction Techniques November 5, 2013 Sleep Mode high-speed: MNX & MPX ON low-power: MNX ON, MPX OFF sleep: MNX OFF, MPX OFF MODE OPERATION: V VD

14 Lecture 16: Power Reduction Techniques November 5, 2013 Leakage Power Results: Anderson 36 60.8 39.7 38.7 0.3 0 10 20 30 40 50 60 70 LP modeSleep modeLP mode (+unused fanout) LP mode (+used fanout) Traditional switch % leakage power reduction vs. high-speed mode Basic

15 Lecture 16: Power Reduction Techniques November 5, 2013 Region Constrained Placement Rather than just focusing on routing, consider constraining logic Most circuits exhibit locality Gayasen: FPGA’2004

16 Lecture 16: Power Reduction Techniques November 5, 2013 Region Constrained Placement Several issues to consider Size of sleep transistor -Too large: increases leakage, area -Too small: affects logic performance Size of region -Too large: possibly unused resources, complicates placement -Too small: Sleep transistors take up too much room

17 Lecture 16: Power Reduction Techniques November 5, 2013 Experimental Flow: RCP Different region sizes considered for flow Area constraints for portions of design determined by hand May encourage designers to create granular designs

18 Lecture 16: Power Reduction Techniques November 5, 2013 Power Savings: RCP Note significant reduction in leakage power savings as region size increases Bottom curve primarily due to luck

19 Lecture 16: Power Reduction Techniques November 5, 2013 Performance Limitation: RCP Performance limited by use of regions Nearly 10% clock frequency reduction for many designs

20 Lecture 16: Power Reduction Techniques November 5, 2013 Low-swing Signalling Techniques we have examined so far look at tinkering with supply voltage Also possible to modify wire signalling to reduce voltage swing Most of FPGA is made up of interconnect Approach targets dynamic power consumption George and Rabaey: 1997

21 Lecture 16: Power Reduction Techniques November 5, 2013 Low-swing Signalling Interconnect swing is at 0.8V while rest of circuit operates at 1.5V Cascode circuitry used at sink to overcome slow speed issues 50% energy savings at cost of 25% delay

22 Lecture 16: Power Reduction Techniques November 5, 2013 Alternate approach: Modifying FPGA CAD FPGA architecture modification impact all designs- even those that don’t care about power Can placement and routing be modified to consider dynamic power -Need to know which signals are high toggle -Attempt to minimize length of high-toggle wires -Minimize impact on performance and area Techniques fit well into our previous work on placement and routing Lamoreaux and Wilton

23 Lecture 16: Power Reduction Techniques November 5, 2013 Modifying FPGA CAD Placement Previous cost metrics for annealing considered bounding box wire length and timing costs Include additional term which considers signal switching activity

24 Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Placement for Power Previous cost metrics for annealing considered bounding box wire length and timing costs Include additional term which considers signal switching activity Post-route energy reduced by 3.0%. Power decreased by 7% but delay increases by 4%

25 Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Routing Modifications for Power Original routing cost function takes congestion b(n) and delay(n) into account Augment with factor that takes net activity into account Minimize length of most active nets, even in the presence of congestion.

26 Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Routing for Power Results Potential benefits somewhat limited by placement Note that most nets have low activity Power is decreased by 6% but delay increased by 4%. Energy savings of about 3%

27 Lecture 16: Power Reduction Techniques November 5, 2013 FPGA Embedded Memory Blocks °Embedded memory blocks (EMBs) are important parts of FPGAs °Consume roughly 14% of Altera Stratix II dynamic power * Increasing in recent designs * Stratix II Low Power Applications Note, 2005

28 Lecture 16: Power Reduction Techniques November 5, 2013 Embedded Memory Block Port Internal View Write Data MClk Write Enable Column Mux Write Buffers Sense Amps Row Decode Read Data Read Enable Latch Address MClk Clk Enable Clk RAM cell BIT Bit Line Pre-charge MClk Reducing clocking saves dynamic power

29 Lecture 16: Power Reduction Techniques November 5, 2013 Power Optimization #1 °Convert EMB read enable/write enable signals to associated read/write clock enable signals °Limitations Each port has read or write enable control signal Embedded memory block has read enable input Clock Wren Data Write Address Read Address Q Write enable Read enable Q Rden Vcc Wr clk enable Rd clk enable Write Address Read Address Clock Wren Data Write Address Read Address Q Write enable Read enable Q Rden Vcc Wr clk enable Rd clk enable Write Address Read Address BeforeAfter

30 Lecture 16: Power Reduction Techniques November 5, 2013 Implementation °Conversion mode Ties off R/W enable to RAM clock enables Doesn’t make transform if CE already present on port °Combining mode AND user RAM clock enables with derived R/W clock Could impact performance Combined Write Clk Enable Write Enable User-defined Write Clk Enable

31 Lecture 16: Power Reduction Techniques November 5, 2013 FPGA RAM Processing °FIFOs and Shift registers converted into logical RAMs °Logical RAMs mapped to RAM blocks FIFO, Shift Register, RAM specification Create Logical Memory Logical RAMs/ logic Logical-to- physical RAM processing RAM blocks/ logic Memory/ logic placement Placed Memory

32 Lecture 16: Power Reduction Techniques November 5, 2013 Mapping RAM to EMBs °Implementation choice can impact design area, performance, and power. °Some mappings may require multiple EMBs 4k deep x 4 wide 16K bits 4K bits M4K User-defined (logical) memory Physical (EMB) memory 512K MRAM

33 Lecture 16: Power Reduction Techniques November 5, 2013 Memory Organization °Each EMB can be configured to have different depth and width (e.g. Stratix II M4K) °All hold 4K bits °Slightly lower power consumption for wider EMB configurations (not including routing) 4K words deep 1 bit wide 32 bits wide 128 words deep 8 bits wide 512 words deep

34 Lecture 16: Power Reduction Techniques November 5, 2013 Area and Delay Optimal Mapping °Configure each EMB to be as deep as possible °Number of address bits on each EMB same as on logical memory °Area and performance efficient: no external logic needed °Power inefficient: All EMBs must be active during each logical RAM access 4k words deep and 1 bit wide (4 times) Addr[0:11] Data[0:3] 4k words deep and 4 bits wide Logical memory 4 EMBs active during access EMB Vertical Slicing

35 Lecture 16: Power Reduction Techniques November 5, 2013 Alternative Mapping °Configure EMB to have width of logical RAM (e.g. 1Kx4) Allows shutdown of some RAMs each cycle But adds some logic °Saves RAM power, adds combinational logic and register power More Power Efficient: 1K deep x 4 wide (4 times) 1 EMB active during access Addr Decoder 4 Addr[0:9] Addr[10:11] Data[0:3] 4k words deep and 4 bits wide Logical memory Addr[10:11] Horizontal Slicing

36 Lecture 16: Power Reduction Techniques November 5, 2013 RAM Slicing - Example °Power reduction available with different slicing 4kx32 Dynamic Power 0 20 40 60 80 100 120 140 Maximum Depth Dynamic Power (mW) Best range Multiplexer Power Increasing 1282565121k2k4k EMB Power Increasing

37 Lecture 16: Power Reduction Techniques November 5, 2013 Power Optimization #2: Power-aware RAM Partitioning °Algorithm considers possible logical to physical RAM mappings Completed placement Insert Decode and Mux Logic FIFO, Shift Register Create Logical Memory Power-aware Physical RAM processing Memory/ Logic Placement Power Library

38 Lecture 16: Power Reduction Techniques November 5, 2013 Experimental Approach °40 designs evaluated °Quartus 5.1 °Mapped to smallest possible device and target max frequency °Simulation with test vectors °Power analysis with PowerPlay

39 Lecture 16: Power Reduction Techniques November 5, 2013 Memory Power °21.0% average reduction for all techniques (9.7% with convert/combine)

40 Lecture 16: Power Reduction Techniques November 5, 2013 Overall Core Dynamic Power °6.8% average power reduction for all techniques (2.6% with convert/combine) -5 0 5 10 15 20 25 30 35 13579111315171921232527293133353739 Designs % Dyn. Power Reduction Enable convert/ combine Enable convert/ combine + mem partition

41 Lecture 16: Power Reduction Techniques November 5, 2013 Design Performance °1.0% average performance loss for all techniques (0.1% for enable convert/combine) Average Design Clock Frequency -30 -25 -20 -15 -10 -5 0 5 10 Designs % Frequency Improvement Enable Convert/ Combine Enable Convert/ Combine + Mem Partition

42 Lecture 16: Power Reduction Techniques November 5, 2013 Results Summary °Almost 7% core dynamic power reduction across all designs Some designs benefit more than others °Minimal clock frequency hit for most designs Enable convert Enable convert/ combine Enable convert/ combine + Mem partition Core dynamic power -1.8%-2.6%-6.8% Memory dynamic power -6.3%-9.7%-21.0% Max clk freq -0.1%-0.2%-1.0% LUT count 0.0%0.1%0.7%

43 Lecture 16: Power Reduction Techniques November 5, 2013 Impact of Multiple Embedded Memory Blocks °Rerun 40 designs but only allow one type of target EMB for each mapping °All designs targeted to Stratix II EP2S180 °Significant power impact for most designs versus EP2S180 target with no restrictions M512M4KM-RAM Designs completed23384 Core dynamic power40.4%6.6%47.3% Memory power279.5%33.3%754.0% Max clk freq.-2.2%0.6%-1.0% LUT count0.4%-0.5%0.0%

44 Lecture 16: Power Reduction Techniques November 5, 2013 Summary °Key to reducing RAM power is keeping clocks disabled. °Movement of read/write enables to clock enables limits dynamic activity °Power-aware RAM partitioner attempts to select power-optimal mapping – combined with clock enable enhancement °Overall About 21% average memory power reduction -10% enable convert/combine About 7% average dynamic power reduction -3% enable convert/combine Diversity of EMBs reduces power by 33%

45 Lecture 16: Power Reduction Techniques November 5, 2013 Summary FPGA power consumption under consideration at numerous level: architecture, circuit, CAD, and physical FPGA companies just now embracing power-aware CAD, power-aware architectures on the way Many circuit-level techniques still possible RTL CAD synthesis techniques provide a promising area for exploration


Download ppt "Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs."

Similar presentations


Ads by Google