Presentation is loading. Please wait.

Presentation is loading. Please wait.

EENG449b/Savvides Lec 10.1 2/17/05 February 17, 2005 Prof. Andreas Savvides Spring 2005 EENG 449bG/CPSC 439bG.

Similar presentations


Presentation on theme: "EENG449b/Savvides Lec 10.1 2/17/05 February 17, 2005 Prof. Andreas Savvides Spring 2005 EENG 449bG/CPSC 439bG."— Presentation transcript:

1 EENG449b/Savvides Lec 10.1 2/17/05 February 17, 2005 Prof. Andreas Savvides Spring 2005 http://www.eng.yale.edu/courses/eeng449bG EENG 449bG/CPSC 439bG Computer Systems Lecture 11 Power Issues and DVS

2 EENG449b/Savvides Lec 10.2 2/17/05 Announcements Reading reference for this lecture –J. Pouwese, K. Lagendoen, H. Sips, “Dynamic Voltage Scaling on a Low Power Microprocessor”, posted on the class website Midterm date discussion & conflicts with other classes

3 EENG449b/Savvides Lec 10.3 2/17/05 Why worry about power? Intel vs. Duracell No Moore’s Law in batteries: 2-3%/year growth Processor (MIPS) Hard Disk (capacity) Memory (capacity) Battery (energy stored) 0 1 2 3 4 5 6 16x 14x 12x 10x 8x 6x 4x 2x 1x Improvement (compared to year 0) Time (years)

4 EENG449b/Savvides Lec 10.4 2/17/05 Current Battery Technology is Inadequate Example: 20-watt battery »NiCd weighs 0.5 kg, lasts 1 hr, and costs $20 »Comparable Li-Ion lasts 3 hrs, but costs > 4x more

5 EENG449b/Savvides Lec 10.5 2/17/05 Comparison of Energy Sources Assume 1mW Average as definition of “Scavenged Energy”

6 EENG449b/Savvides Lec 10.6 2/17/05 Trends in Total Power Consumption Frightening: proportional to area & frequency DEC 21164 source : arpa-esto microprocessor power dissipation

7 EENG449b/Savvides Lec 10.7 2/17/05 Power Metrics in Microprocessors nJ/Instruction –Mostly for processors with the same instruction sets –Does not capture the effect of operand size (e.g 8-bit addition vs. 32-bit addition operations MIPS/Watt mA – common among component data sheets Remember:

8 EENG449b/Savvides Lec 10.8 2/17/05 Modeling the Battery Behavior Theoretical capacity of battery is decided by the amount of the active material in the cell »batteries often modeled as buckets of constant energy e.g. halving the power by halving the clock frequency is assumed to double the computation time while maintaining constant computation per battery life In reality, delivered or nominal capacity depends on how the battery is discharged »discharge rate (load current) »discharge profile and duty cycle »operating voltage and power level drained

9 EENG449b/Savvides Lec 10.9 2/17/05 Battery Capacity Current in “C” rating: load current nomralized to battery’s capacity »e.g. a discharge current of 1C for a capacity of 500 mA-hrs is 500 mA from [Powers95]

10 EENG449b/Savvides Lec 10.10 2/17/05 Battery Capacity vs. Discharge Current Amount of energy delivered is decreased as the current (rate at which power is drawn) is increased »rated as ampere hours or watt hours when discharged at a specific rate to a specific cut-off voltage –primary cells rated at a current which is 1/100th of the capacity in ampere hours (C/100) –secondary cells are rated at C/20 or C/10 At high currents, the diffusion process that moves new active material from electrolytes to the electrode cannot keep up »concentration of active material at cathode drops to zero, and cell voltage goes down below cut-off »even though active material in cell is not exhausted!

11 EENG449b/Savvides Lec 10.11 2/17/05 Battery Energy Consumers

12 EENG449b/Savvides Lec 10.12 2/17/05 Power Supply Where does the Power Go? Battery DC-DC Converter Communication Radio Modem RF Transceiver Processing Programmable  Ps & DSPs (apps, protocols etc.) Memory ASICs Peripherals Disk Display

13 EENG449b/Savvides Lec 10.13 2/17/05 Power Consumption for a Computer with Wireless NIC

14 EENG449b/Savvides Lec 10.14 2/17/05 Energy Consumption of Wireless NICs (Wavelan) SpecsMeasured 2 Mbps (Bronze) Sleep Mode Idle Mode Receive Mode Transmit Mode 9 mA -------- 280 mA 330 mA 14 mA 178 mA 200 mA 280 mA 11 Mbps (Silver) Sleep Mode Idle Mode Receive Mode Transmit Mode 10 mA -------- 180 mA 280 mA 10 mA 156 mA 190 mA 284 mA

15 EENG449b/Savvides Lec 10.15 2/17/05 Example: Power Consumption for Compaq’s iPAQ 206MHz StrongArm SA-1110 processor 320x240 resolution color TFT LCD Touch screen 32MB SDRAM / 16MB Flash memory USB/RS-232/IrDA connection Speaker/Microphone Lithium Polymer battery PCMCIA card expansion pack & CF card expansion pack * Note CPU is idle state of most of its time Audio, IrDA, RS232 power is measured when each part is idling Etc includes CPU, flash memory, touch screen and all other devices Frontlight brightness was 16

16 EENG449b/Savvides Lec 10.16 2/17/05 Microprocessor Power Consumption CMOS Circuits (Used in most microprocessors) Dynamic Component Digital circuit switching inside the processor Static Component Bias and leakage currents O(1mW) Static Dynamic

17 EENG449b/Savvides Lec 10.17 2/17/05 Power Consumption in Digital CMOS Circuits - current constantly drawn from the power supply - determined by fabrication technology - short circuit current due to the DC path between the supply rails during output transitions - load capacitance at the output node - clock frequency- power supply voltage

18 EENG449b/Savvides Lec 10.18 2/17/05 Dynamic Voltage Scaling What can you do to conserve power on a processor? Dynamic power consumption is the dominant component Example: Transmeta’s Crusoe processor

19 EENG449b/Savvides Lec 10.19 2/17/05 DVS on Low Power Processor Maximum gain when voltage is lowered BUT lower voltage increases circuit delay CMOS transistor threshold voltage Transistor gain factor Dynamic Power Component Number of gates Load capacitance of gate k Propagation delay

20 EENG449b/Savvides Lec 10.20 2/17/05 Voltage Scaling on LART Dynamically lower the processor voltage and frequency to reduce power consumption LART wearable board –StorngARM 1100 Processor 190MHz –Various I/O capabilities –32 MB volatile memory –4 MB non-volatile memory –Programmable voltage regulator

21 EENG449b/Savvides Lec 10.21 2/17/05 Processor Envelope At 1.5V Max clock frequency 251MHz Min frequency the processor functions correctly is 59MHz

22 EENG449b/Savvides Lec 10.22 2/17/05 LART Power Measurement Note the measurement setup at Different levels on the board Always provide hooks for measurement, testing and debugging during your design. Both for software and hardware!!! Total Power Consumption on the LART Platform Based on dhrystone benchmark

23 EENG449b/Savvides Lec 10.23 2/17/05 System Support Requirements To manage DVS effectively, the computation requirements must be known in advance Predictive scheme –Try to learn that behavior based on the computation profile Better scheme: Applications should be power aware Processor frequency and scaling should be changed without much delay –This is specific to each processor –150us for the LART processor

24 EENG449b/Savvides Lec 10.24 2/17/05 Example: Power Aware Video Playback Annotate a H.263 video decoder with information on the clock speed required to decode a known video sequence Using a 12.6s video, 15fps Power consumption measurements for LART –No-DVS: 198mW for CPU, 207mW for memory subsystem –DVS: 100mW for CPU and 204mW for the memory subsystem –2X improvement, but 25% improvement when memory accesses are considered

25 EENG449b/Savvides Lec 10.25 2/17/05 LART Memory Performance Memory access is optimal when high resolution memory access timing is available For LART the optimal memory pattern: –148MHz –92 MB/s memory bandwidth –Power consumption 514.2mW –Energy cost 5.6mJ/MB

26 EENG449b/Savvides Lec 10.26 2/17/05 Memory Subsystem Power Consumption – Read Operation Power consumption Memory Bandwidth Optimal memory access waveforms

27 EENG449b/Savvides Lec 10.27 2/17/05 Energy breakdown for read (based on 1MB read) Regulator Loss-factor

28 EENG449b/Savvides Lec 10.28 2/17/05 Power Breakdown for H.263 Decoder

29 EENG449b/Savvides Lec 10.29 2/17/05 Reducing Power Consumption is a multilevel task! Physical layer –Technology – reduce the surface of CMOS circuits Architecture/IC level –Several optimizations in the design (e.g parallelism and pipelining) –Provide hooks for software driven power management (e.g different power modes and clock speeds) OS Level –Smart schedulers, interval schedulers, DVS Application Level –Power aware applications that worn the OS and the hardware about the features needed during application lifetime –Sleep modes and DVS driven by applications Network Level –Networked devices may be able to apply low duty cycles, in which some of the devices are asleep and others are awake

30 EENG449b/Savvides Lec 10.30 2/17/05 Conclusions Interval based schedulers not so efficient –Interval-scheduler – reduce voltage after a pre- specified idle period is detected Better leverage of DVS when the processor is aware of the application requirements –Illustrated with the H.263 encoder Monitor different power consumption profiles across different sections of the platform and use them to make clever decisions about power-management What is missing: –Comments on power regulator efficiencies…

31 EENG449b/Savvides Lec 10.31 2/17/05 Announcements Need to start deciding on the final projects. We need to discuss these with you individually at the end of class One page detailed proposal by March 3 This should include –1 paragraph motivation and description of your project –1 paragraph on the approach you are going to use and the tools –1 paragraph on evaluation »What is the strategy you will use to evaluate the performance of your project.

32 EENG449b/Savvides Lec 10.32 2/17/05 DVS Example Consider a processor with DVS Frequency range 250 – 59MHz Supply Voltage range 0.8V (@49MHz) and 1.5V (@250MHz) Assume that the processor can compute at 1 MIPS per MHz.

33 EENG449b/Savvides Lec 10.33 2/17/05 DVS Example 1 What is the maximum energy saving the processor can achieve with dynamic voltage scaling? What is missing?

34 EENG449b/Savvides Lec 10.34 2/17/05 Task Execution Energy Cost A certain task needs to run on the processor. The task requires 200 Million Instructions to complete. Which power level will be the most efficient?

35 EENG449b/Savvides Lec 10.35 2/17/05 Power Consumption on Embedded Processors Different core I/O from Peripheral I/O – numbers here –Cores scaling down to 0.8V. 1.8V devices are becoming common –General Purpose I/O interfaces still at 3.0 – 3.3V »Makes power supply harder, additional regulator inefficiency Sleep modes and associate cost of sleep and recovery SA-1100 modes –Need time and energy to transition between states

36 EENG449b/Savvides Lec 10.36 2/17/05 Example: SA-1100 CPU RUN IDLE –CPU stopped when not in use –Monitoring for interrupts SLEEP –Shutdown on-chip activity RUN IDLE SLEEP 400 mW 50 mW 0.16 mW 90  s 10  s 90  s 160 ms

37 EENG449b/Savvides Lec 10.37 2/17/05 Duty Cycling: Exploiting Sleep Modes Imagine a processor with max power consumption 120mW Power supply voltage 2.5V We need to power the device form a 2000mAh battery for 1 year Sleep mode draws 20uA current What is the duty cycle the device needs to operate at to last for at least 1 year?

38 EENG449b/Savvides Lec 10.38 2/17/05 Duty cycling 1 year has 365 x 24 = 8760 hours

39 EENG449b/Savvides Lec 10.39 2/17/05 Voltage Reduction is Better Example: task with 100ms deadline, requires 50ms CPU time at full speed –normal system gives 50ms computation, 50ms idle/stopped time –half speed/voltage system gives 100ms computation, 0ms idle –same number of CPU cycles but 1/4 energy reduction Speed Time T1T2T1T2 Idle Same work, lower energy Task

40 EENG449b/Savvides Lec 10.40 2/17/05 Problem with Voltage Reduction Voltage gets dictated by the tightest (critical) timing constraint » not a problem if latency not important –throughput can always be improved by pipelining, parallelism etc. »but, real systems have bursty throughput and latency critical tasks Solution: dynamically vary the voltage!

41 EENG449b/Savvides Lec 10.41 2/17/05 Normalized Workload Normalized P o wer Fixed Supply Variable Supply from [Gutnik96] (VLSI Symposium) Varying the Supply Voltage

42 EENG449b/Savvides Lec 10.42 2/17/05 XYZ Node Frequency Scaling

43 EENG449b/Savvides Lec 10.43 2/17/05 Code Optimizations for Low Power High-level operations (e.g. C statement) can be compiled into different instruction sequences »different instructions & ordering have different power Instruction Selection –Select a minimum-power instruction mix for executing a piece of high level code Instruction Packing & Dual Memory Loads – Two on-chip memory banks »Dual load vs. two single loads »Almost 50% energy savings

44 EENG449b/Savvides Lec 10.44 2/17/05 Code Optimizations for Low Power (contd.) Reorder instructions to reduce switching effect at functional units and I/O buses –E.g. Cold scheduling minimizes instruction bus transitions Operand swapping –Swap the operands at the input of multiplier –Result is unaltered, but power changes significantly! Other standard compiler optimizations –Intermediate level: Software pipelining, dead code elimination, redundancy elimination –Low level: Register allocation and other machine specific optimizations Use processor-specific instruction styles –e.g. on ARM the default int type is ~ 20% more efficient than char or short as the latter result in sign or zero extension –e.g. on ARM the conditional instructions can be used instead of branches

45 EENG449b/Savvides Lec 10.45 2/17/05 Minimizing Memory Access Costs Reduce memory access, make better use of registers –Register access consumes power << than memory access Straightforward way: minimize number of read-write operations, e.g. Cache optimizations –Reorder memory accesses to improve cache hit rates Can use existing techniques for high-performance code generation

46 EENG449b/Savvides Lec 10.46 2/17/05 Low-power Software Strategies Code running on CPU –Code optimizations for low power Code accessing memory objects –SW optimizations for memory Data flowing on the buses –I/O coding for low power Compiler controlled power management CPU Cache Memory

47 EENG449b/Savvides Lec 10.47 2/17/05 How can power consumption be reduced at the circuit design level inside a processor?

48 EENG449b/Savvides Lec 10.48 2/17/05 Example: Reference Datapath from “Digital Integrated Circuits” by Rabaey n Critical path delay: T adder + T comparator = 25 ns n Frequency: f ref = 40 MHz n Total switched capacitance = C ref n V dd = V ref = 5V n Power for reference datapath = P ref = C ref V ref 2 f ref

49 EENG449b/Savvides Lec 10.49 2/17/05 Parallel Datapath from “Digital Integrated Circuits” by Rabaey n The clock rate can be reduced by x2 with the same throughput: f par = f ref /2 = 20 MHz n Total switched capacitance = C par = 2.15C ref n V par = V ref /1.7 n P par = (2.15C ref )(V ref /1.7) 2 (f ref /2) = 0.36P ref

50 EENG449b/Savvides Lec 10.50 2/17/05 Pipelined Datapath from “Digital Integrated Circuits” by Rabaey n f pipe = f ref C pipe = 1.1C ref V pipe = V ref /1.7 n Voltage can be dropped while maintaining the original throughput n Pipe = C pipe V pipe 2f pipe = (1.1C ref )(V ref /1.7) 2 2f ref = 0.37P ref

51 EENG449b/Savvides Lec 10.51 2/17/05 Datapath Architecture-Power Trade-off Summary

52 EENG449b/Savvides Lec 10.52 2/17/05 Back to Processor Architecture: ARM Performance Some possible avenues of optimizing performance and power consumption on the ARM –Use the on-chip cache –Write code in 16-bit mode assembly »Need only one memory access to fetch an instruction –Execution in RAM vs. Flash –Write code in assembly Refer to the ARM assembly language handout for more references


Download ppt "EENG449b/Savvides Lec 10.1 2/17/05 February 17, 2005 Prof. Andreas Savvides Spring 2005 EENG 449bG/CPSC 439bG."

Similar presentations


Ads by Google