Can we Save Energy if we allow Errors in Computing? Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Can we Save Energy if we allow Errors in Computing? Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign jhpatel@illinois.edu Auburn University February 18, 2016

2 Outline Why Energy versus Accurate Computing? An example of Energy versus Accuracy Tradeoff Challenge to the Accepted Tradeoff Model A Thought Experiment Two Real Experiments Potential exploits of the new hypothesis Power savings in large data-centers Predicting early failures in microprocessors

3 Why Energy vs. Accuracy? Manufacturing Process Variations unavoidable Gate Oxide thickness (T OX ) Random Doping Fluctuations (RDF) Device geometry, Lithography (W, L) Resulting Transistor Threshold Variations ( σ V T ) Designing for Worst Case for error-free computing Wastes Energy, Performance and Yield Designing for Typical Case may cause errors Tolerating occasional errors saves Energy, gives higher Performance and improves Yield In Future technologies, Errors are inevitable

4 Within-Die Variation Example “Teraflop Research Chip” Courtesy Intel, ISSCC 2010 80 Identical Cores But different Fmax Within a single chip

5 Designing with a Perfect Process S 63 S 62 S 61 S 60 S3S3 S2S2 S1S1 S0S0 15ps60ps45ps30ps 960ps Carry Propagations Slack time/ Guard band/ Safety margin 960ps 40ps A 1GHz 64-bit Adder 1000ps A 0 B 0 A 1 B 1 A 63 B 63 Full Adder S C 15ps

6 Modeling Process Variations Measured Data from Fabrication Analytical Process Models Die-to-Die and Within-Die Variations Full Adder S C 15ps±2ps   15ps 20ps 10ps

7 Worst Case Design S 63 S 62 S 61 S 60 S3S3 S2S2 S1S1 S0S0 ≈15ps≈60ps≈45ps≈30ps 960 ± 128ps Carry Propagations 830ps A 64-bit Adder with Process Variations 1090ps 260ps Increase Voltage to meet 1ns cycle time. Costs More Energy. Increase Cycle Time to 1.1ns. Costs in Performance. Throw away the Slow Chips! Loss in Yield and Revenue A 0 B 0 A 1 B 1 A 63 B 63 Full Adder S C 15 ± 2ps

8 Saving Energy and Performance S 63 S 62 S 61 S 60 S3S3 S2S2 S1S1 S0S0 ≈15ps≈60ps≈45ps≈30ps ≈960ps Carry Propagations ≈960ps A 64-bit Adder 1000ps Errors in most significant bits on some computations If we can live with these errors, we case save Energy, Performance and Yield. 830ps 1090ps A 0 B 0 A 1 B 1 A 63 B 63

9 Energy Savings with Errors Register ALU Register Clock 1 GHz 960 ps ±130 ps ALU Outputs 0ps 960ps 1 2  Errors Per Min/Sec  Voltage Reduction 830ps 1090ps

10 Basics of Power and Energy Energy to Switch a Logic Gate Output ½ CV 2 Joules  C is capacitance as seen by the Gate Output  V is the Supply Voltage Energy spent by the chip during one Clock Cycle kV 2 Joules  k is a constant reflecting average chip Activity and Total Gate Capacitance Power for the chip running at F cycles/sec kV 2 F Joules/sec or Watts

11 Power or Energy? Common Pitfalls Energy = kV 2 Joules per Clock Cycle Power = kV 2 F Joules/sec A Task that takes N Cycles to compute, consumes kV 2 N Joules Independent of Frequency!! Some Observations Modern Chips also have Static or DC power, S. Power = S + kV 2 F Joules/sec N-cycle Task, completes in N/F sec and consumes (S + kV 2 F)*(N/F) = SN/F + kV 2 N Joules Lowering Power by Frequency increases Energy!

12 Energy Savings with Errors Register ALU Register Clock 1 GHz 960 ps ±130 ps ALU Outputs 0ps 960ps 1 2  Errors Per Min/Sec  Voltage Reduction 830ps 1090ps

13 Protecting against Errors If the error rate from added delays remains relatively small, we can utilize some of the established techniques Circuit Level – iROC, Razor, etc. Error Coding – Parity codes, Arithmetic codes, Residue codes, Parity prediction, Algorithm-based fault-tolerance, TMR etc. Time Redundancy – RESO What if the error rate is massive? Are massive errors possible in a good chip?

14 Signal Propagations Combinational Logic Present State Next State FFs clock 0 900 960 1000 ps Calculate from timing analysis How many flip-flops on critical-paths? Threshold

15 Consider a 1-Ghz chip with a million flip-flops Let us divide the 1ns Clock period in to 1000 bins Put a FF in bin p if the longest path at its input has a delay of p picoseconds How many FFs are in bins 900ps to 950ps? Path Delay in picoseconds → How many flip-flops on critical-paths?

16 Consider a 1-Ghz chip with a million flip-flops Let us divide the 1ns Clock period in to 1000 bins Put a FF in bin p if the longest path at its input has a delay of p picoseconds How many FFs are in bins 900ps to 950ps? Path Delay in picoseconds → How many flip-flops on critical-paths?

17 Signal Propagations Combinational Logic Present State Next State FFs 1M FFs clock 0 900 960 1000 ps 200,000? 100,000? 50,000? How many flip-flops on critical-paths?

18 A Thought Experiment Let us conservatively assume 100,000 ffs are on critical paths (10% of total) Now Reduce the supply Voltage, thus increasing Delay Assume just 10% of critical ffs get its inputs late this cycle This implies 10,000 flip-flops produce errors in a single clock cycle! Massive number of errors result in a few clock cycles

19 Do your own Analysis Analytical Procedure Estimate number (n) of FFs on critical paths from timing analysis or synthesis report - Compute percent (p) of FFs that switch per cycle from functional simulation p X n = errors/cycle For example: Total Number of Flip-Flops: 400,000 5% of these are on critical paths: n = 20,000 FFs Only 5% of these switch every cycle: p = 5% Error Rate = p X n = 1000 errors/cycle In 10 consecutive clock cycles: 10,000 errors!

20 A new hypothesis  Errors/ Cycle 10 4 10 6 NEW (large chips) 1 2 OLD (small chips)  Errors Per Min/sec Zero errors Massive errors Reduce Supply Voltage 

21 Hypothesis of Critical Operation Point In large CMOS circuits there exists a Critical Voltage V C for a fixed Frequency F and ambient temperature T, such that Any voltage below V C, massive errors result Any voltage above V C, no delay related errors occur No tradeoff is possible between Energy and Errors in large chips No practical ways exist to handle massive errors! No error handling needed above V C

22 Experiments to disprove the hypothesis Subject a large chip to slowly decreasing supply voltage, keeping the Frequency fixed. At each step, exercise the chip extensively and monitor continuously for any errors

23 Experiments on Microprocessors PowerPC 750 2.5V, 233MHz Voltage reduced in steps of 10 mV A C-program monitored the errors while stressing the processor Pentium-M (1.308V, 2GHz) down to (0.700V, 0.798GHz) For each of the 9 different frequencies, Voltage reduced in steps of 16 mV Third-party program to keep the cpu 100% busy and report errors (more like a power virus!)

24 Experiment to find which of these two?  Errors/ Cycle 10 4 10 6 NEW Decrease Supply Voltage  1 2 OLD  Errors Per Min/sec Decrease Supply Voltage 

25 Experimental Set-Up for PowerPC A Single-Board-Computer with PowerPC 750 233MHz, 2.5V Power Supply A Hewlett-Packard E3631A Power Supply  Digital control in units of 10 miliVolts steps A Blow-Drier to raise the ambient temperature A Program written to stress all major functional blocks Tried to maximize execution rate (load) Tried to maximize logic switching rate Every operation was checked against known good values and instantly reported for any error

26 “Stressing” PowerPC 750 (233 MHz)

27 Results of Lowering Supply Voltage Power PC-750 μ P Nominal Supply Voltage of 2.5 V is reduced in steps of 1/100 th Volt with clock frequency constant at 233MHz No Data Error was ever Observed at user visible Registers!

28 Intel Pentium-M Processor “Speed step” technology with 9 different speeds Rated at 2GHz at core voltage of 1.308V Experiment Choose one of the nine frequencies F1, …, F9 While keeping cpu 100% busy reduce the voltage in steps of 16mV Monitor for errors No errors observed – only crashes!

29 Experiment on Pentium-M No errors! Catastrophic failure!

30 Some Remarks on Experiment Possible explanation for the observations A modern processor has a large number of flip- flops that are not user visible  e.g. Pre-fetch buffers, history tables, reservation stations, write buffers, and state controllers for everything from moving instructions and data to controlling a cache Control Logic fails simultaneously with ALU datapath Massive errors in control and data in a single cycle Instruction flow is completely disrupted. Therefore no error could be reported. Catastrophic failure!

31 Personal Remarks CMOS technology is robust now and will continue to be so for the foreseeable future Process Variation related errors if any, must be massive No industry can survive with massive failures Process variations must remain bounded within some reasonable limits Moore’s Law continues to hold! 45nm with (HiK+MG) has lower RDF and T OX variations than 65nm [Kelin J. Kuhn, Reducing Variation in Advanced Logic Technologies: Approaches to Process and Design for Manufacturability of Nanoscale CMOS, IEDM 2007.]

32 Exploiting Process Variations If the “critical operation point hypothesis” holds Above critical supply voltage V C no errors occur. In large data-centers with 1000’s of microprocessors, operating each with the lowest V C for a given frequency can save lots of power As the number of cores approach 100 or more, it would be imperative to use different voltage- frequency pair (F C, V C ) for each core on the same die.

33 Dynamic Power Savings in Pentium-M

34 Energy Saving Ideas for Microprocessors Simplify CPU Remove Floating Point Unit Remove Divider Remove Multiplier Reduce Word Width  From 32 bits to 16 or 8 bits Simplify Memory Hierarchy Remove Virtual Addresses Manage Memory Hierarchy explicitly

35 Accuracy vs. Quality Energy and Quality of Solution Tradeoff All Iterative Algorithms that improve the Quality of Solution with more iterations (more Energy)  Simulated Annealing, Genetic, Steepest Decent All Algorithms that use Grid of various dimensions, improve the Quality of Solution with finer Grid, or more Resolution (more Energy)  Image Processing, Weather Modeling, Device Physics Simulation Signal Processing Signal/Noise Ratio, Less Power, More Errors

36 Final Observations For Large Chips and Data-centers There is no possible Trade-off between Energy and Inaccurate Computing Must avoid confusion between Energy and Power Frequency can not affect the Energy of a Task  Lower Frequency implies Higher Energy! Critical Voltage is easy to determine for a chip Aging of Chips can be easily detected All future researchers must keep in mind the large number of flips-flops on Critical Paths!

37 Questions? Comments?

Can we Save Energy if we allow Errors in Computing? Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Similar presentations

Presentation on theme: "Can we Save Energy if we allow Errors in Computing? Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Can we Save Energy if we allow Errors in Computing? Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Similar presentations

Presentation on theme: "Can we Save Energy if we allow Errors in Computing? Janak H. Patel Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign."— Presentation transcript:

Similar presentations

About project

Feedback