Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011.

Similar presentations


Presentation on theme: "Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011."— Presentation transcript:

1 Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011

2 Motivation Design & Implementation Experiments Conclusion & Work in Progress

3 Motivation Design & Implementation Experiments Conclusion & Work in Progress

4 Watts/Server [source: The Problem of Power Consumption in Servers,Intel,2009] CPU no longer dominates the system power. [source: Barroso et. al., The datacenter as a computer, 2009]

5 Measurement is the basis. Low power Hardware Software model measurement

6 Component-Level: ATX-based method accuracy Directly powered through ATX wires. Modern motherboards mostly have dedicated ATX wires for processor. VRM (Voltage Regulation Module) loss Modern motherboards mostly have dedicated ATX wires for processor. VRM (Voltage Regulation Module) loss Usually deduced from multi ATX wires. Platform dependent. Usually deduced from multi ATX wires. Platform dependent.

7 Motivation Design & Implementation Experiments Conclusion & Work in Progress

8 Power Supply Disk & CPU Similar to other ATX-based methods Memory & Add-in Card Devices Wrapper-based methods Advantages Accurate: direct measurement Easy-to-use: no deduction needed Portable: multi-platform Current Sensor

9 Prototype Disk power CPU power Memory power

10 Motivation Design & Implementation Experiments Conclusion & Work in Progress

11

12 401.bzip2 from SPECCPU2006

13 More frequently we measure the power, more details we can get. Observation: 5,000 samples/s is an appropriate sample frequency at component level. Observation: 5,000 samples/s is an appropriate sample frequency at component level.

14 Higher BW, but lower Power Higher BW, but lower Power Lower BW, Higher Power Lower BW, Higher Power

15 Time: 6.5 times longer Power: slightly lower Energy: 5.9 times higher Time: 6.5 times longer Power: slightly lower Energy: 5.9 times higher Malloc 512MB Access in different strides Two causes Row conflict Lots of TLB miss increase row buffer hit rate large page may be more efficient What is the relationship between performance and power?

16 64MB memory Random vs. Sequential Jump at least 64B eliminate cache hit Large page(2MB) eliminate TLB miss Load/Sotre_Unit % = LSU_stall_time/CPU_Cycle Observation: It seems that DRAM power is already proportional to bandwidth. But the fact is that … Observation: It seems that DRAM power is already proportional to bandwidth. But the fact is that …

17 Use different SEEDs to generate different random access patterns; Power varies less than 1.1%. Observation: DRAM power is highly correlated to two factors Load/Store Unit Utilization Sequential / Random We can build memory power models based on the two factors rather than Bandwidth. Observation: DRAM power is highly correlated to two factors Load/Store Unit Utilization Sequential / Random We can build memory power models based on the two factors rather than Bandwidth.

18 Motivation Design & Implementation Experiments Conclusion & Work in Progress

19 We use a hybrid approach ATX-Based CPU/Disk Wrapper card DRAM/… 5KHz is an appropriate sampling frequency to disclose fine-grain power behavior. DRAM power is highly correlated to Load/Store Unit Utilization, rather than Bandwidth.

20 Upgrade current system Support DDR3 Support Large memory capacity Support 40 simultaneous measuring channels Use FPGA to collect measured data Correlate the measured power data with high-level semantics information

21 Thanks & Questions?

22 Backup

23 Wrapper Card already exists We only did several small modifications Current Sensor Power SupplySignals

24 DIMM slot Motherboard DIMM: Dual-Inline Memory Module Normal

25 With our initial wrapper card DIMM slot Motherboard DIMM Wrapper Card

26

27

28 28 Bank 0 Sense Amps Column Decoder Sense Amps Column Decoder Row Decoder ODT Recievers Driver s Registers Write FIFO Banks Independent arrays Asynchronous: independent of memory bus speed Banks Independent arrays Asynchronous: independent of memory bus speed I/O Circuitry Runs at bus speed Clock sync/distribution Bus drivers and receivers Buffering/queueing I/O Circuitry Runs at bus speed Clock sync/distribution Bus drivers and receivers Buffering/queueing On-Die Termination Required by bus electrical characteristics for reliable operation Resistive element that dissipates power when bus is active On-Die Termination Required by bus electrical characteristics for reliable operation Resistive element that dissipates power when bus is active [Source: H. David et. al., Memory Power Management via Dynamic Voltage/Frequency Scaling, ICAC, 2011]

29 Can be approximately divided into Background power considered to be stable Bank power active/precharge Related to frequency of row operation I/O power Burst proportional to bandwidth Termination power Termination resistors Proportional to bandwidth

30 P = U * I ADC or DMM ADC or DMM CSA (Current-Sense Amplifier) CSA (Current-Sense Amplifier) DC Voltage DC Current Doesnt fluctuate too much, less than 2% in our platform. Collector (PC) Collector (PC) Data

31 Possible reason for non-proportional of random power in slide17: When bandwidth is low, auto-precharge (caused by refresh) cause every access needs ACTIVE; the bank power is proportional to bandwidth. When bandwidth is high, some access may hit in the row buffer, which need less ACTIVE; the slope of bank power increase is lower than before.


Download ppt "Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011."

Similar presentations


Ads by Google