Presentation is loading. Please wait.

Presentation is loading. Please wait.

Power and Performance Modeling in a Virtualized Server System M. Pedram and I. Hwang Department of Electrical Engineering Univ. of Southern California.

Similar presentations


Presentation on theme: "Power and Performance Modeling in a Virtualized Server System M. Pedram and I. Hwang Department of Electrical Engineering Univ. of Southern California."— Presentation transcript:

1

2 Power and Performance Modeling in a Virtualized Server System M. Pedram and I. Hwang Department of Electrical Engineering Univ. of Southern California

3 Outline Study and its purpose Motivation Concepts and Definitions Methodology Experimental system setup Simulation results Power modeling Performance modeling Power-performance tradeoff analysis

4 Motivation  Modern data centers consume large amounts of energy and put a lot of stress on the aging Power Grid  Average utilization level of enterprise servers in a typical datacenter is quite low  Service level agreements (SLAs) between clients and datacenter operators are critically important esp. for hosting centers  Accurate models of power and performance are essential for dynamic resource provisioning and allocation as well as power/thermal management

5 Virtualization  Virtualization is disassociating the tight bond between software and hardware by introducing a hypervisor between the OS and hardware  One can then use the same hardware to serve up the needs of the different software servers: Oracle, MS SQL Server, Exchange, Dynamics CRM, etc.  It is also possible to run different operating systems so one could run MS SQL Server 2008 on Windows 2008 Server and run Oracle on Linux all running on the same hardware  By doing this, the resources will be better utilized since we can easily add/migrate virtual machines  Examples include Microsoft Hyper-V, VMware ESX Server 3.5, Linux Xen hypervisor

6 Virtualization Cont’d

7 Concepts and Definitions  Processor: An integrated circuit containing possibly multiple cores, caches, memory and other I/O interfaces  Hypervisor (Virtual Machine Monitor): Essentially a hardware platform virtualization software with which one can run different OS on the same hardware at the same time  VMM is responsible for CPU scheduling and memory partitioning of the various virtual machines running on the hardware device; Also it is the software that allocates basic machine resources including CPU time and memory  Fully-virtualized: A virtual machine not aware of its virtualization (needs no OS support, is fully transparent but has high overhead)  Para-virtualized: A virtual machine running on a hypervisor that is aware of it being virtualized

8 Definitions Cont’d  Domain 0, the Privileged Domain (Dom0): Privileged guest running on the hypervisor with direct hardware access and guest management responsibilities  Multiple Domain U, Unprivileged Domain Guests (DomU): Unprivileged guests running on the hypervisor; they have no direct access to hardware  Virtual machine (VM): Same as a domain  (Physical) CPU : A physical core in a processor  Virtual CPU: (one or more per VM) – maybe a process, request, or job that must run on a CPU  Each CPU manages a local run queue of runnable VCPUs; This queue is sorted by VCPU priority  CPU utilization: Total of utilization of all CPUs in the system  E.g., 400% for 4 physical CPUs that are fully utilized

9 Methodology

10 Workload Generator  Generate tasks for load the system  TCP/IP based structure  Mimic web-based services  Schedule various types of tasks  CPU/IO/MEM bound, inter-arrival time, etc

11 Experimental System Setup  Each processor is equipped with "Demand Based Switching" power management  We cut the (12V) power line for the processor chip and measure DC power dissipation  Processors  Voltage regulator  Model: Intel Dual Xeon E5400 series; Code Name: Harpertown  Two processor system; Each processor is quad-core  The chip supports two frequency levels: 2.0GHz, 2.33GHz  Two cores in a socket must run the exact same frequency

12 Test Cases: Taxonomy Case # of vCPUs ID of pCPUs Processor 1Processor 2 02461357 140,2,4,6LLLL 240,2,4LLLL 74 HHHH 1310LLLL 1840,1HHHHHHHH  Variables: # of vCPUs, set of active pCPUs, and their freq. levels

13 Complete set of Test Cases Case # of vCPUs ID of pCPUs Processor 1Processor 2 02461357 140,2,4,6LLLL 240,2,4LLLL 340,2LLLL 440,4LLLL 540LLLL 640,2,4,6HHHH 740,2,4HHHH 840,2HHHH 940,4HHHH 1040HHHH 1140,2LLHH 1240,4LLHH 1310LLLL 1410HHHH 1540,2LLLLLLLL 1640,1HHHHLLLL 1740,2HHHHLLLL 1840,1HHHHHHHH

14 Power Measurement Results  Linear model  Cluster 1: cases 1-5, 11  Low frequency  Cluster 2: cases 6-10, 12  High & mixed frequency Case ID of pCPU Processor 1 Case ID of pCPU Processor 1 02460246 10,2,4,6LLLL6 HHHH 20,2,4LLLL7 HHHH 30,2LLLL8 HHHH 40,4LLLL9 HHHH 50LLLL100HHHH 110,2LLHH120,4LLHH

15 Power Analysis Observation 1: When all active CPUs are running at the low frequency level, the processor power dissipation is nearly independent of which subset of CPUs is used by the running domain (cf. cases 1 through 5, and 11) Even if some of inactive CPUs are running at a high frequency, they do not make any difference (case 3 vs. case 11) Core consolidation is ineffective for power saving CaseabR squared 10.04612.650.998 20.04612.650.999 30.04712.390.999 40.04912.400.998 50.04912.280.991 110.04712.380.997 Case # of vCPUs ID of pCPUs Processor 1 0246 140,2,4,6LLLL 240,2,4LLLL 340,2LLLL 440,4LLLL 540LLLL 1140,2LLHH

16 Power Analysis Cont’d Observation 2: When all active CPUs are running at high frequency, processor power dissipation is only weakly dependent on the subset of CPUs used by the running domain (cf. cases 6 through 10) caseabR squared 60.08814.440.993 70.09113.980.997 80.09513.580.996 90.09613.620.997 100.12712.480.996 Case # of vCPUs ID of pCPUs Processor 1 0246 640,2,4,6HHHH 740,2,4HHHH 840,2HHHH 940,4HHHH 1040HHHH Power slope increases as # of pCPUs decreases There is a small change in power offset, but this is mainly caused by regression error Core consolidation is ineffective for power saving

17 Power Analysis Cont’d Observation 3: Power dissipation for the case where active CPUs are running at different frequency levels is similar to that of the case that has the same # of active CPUs all running at the high frequency level (cf. cases 9 and 12) Power slopes are nearly the same Only small difference in the power offsets caseabR squared 90.09613.620.997 120.09711.740.998 Case # of vCPUs ID of pCPUs Processor 1 0246 940,4HHHH 1240,4LLHH

18 Power Analysis Cont’d Observation 4: The idle power dissipation of the processor chip is somewhat higher for the higher frequency level (cf. cases 1-5 vs. 6-10) caseabR squared caseabR squared 10.04612.650.99860.08814.440.993 20.04612.650.99970.09113.980.997 30.04712.390.99980.09513.580.996 40.04912.400.99890.09613.620.997 50.04912.280.991100.12712.480.996 Case # of vCPUs ID of pCPUs Processor 1Processor 2 02461357 140,2,4,6LLLL … 540LLLL 64 HHHH … 1040HHHH

19 Power Analysis Cont’d Observation 5: Processor consolidation is helpful in reducing power dissipation (cf. case 17 vs. case 18) Case # of vCPUs ID of pCPUs Processor 1Processor 2 02461357 1740,2HHHHLLLL 1840,1HHHHHHHH case 17 = cases 8 & 9 Processor 2 is not being used, so it only increases the offset power case 18 = case 10 Settings of the two processors are identical The maximum possible utilization of case 18 is 200%, which is twice that of case 10

20 Power Model caseabR squared caseabR squared 10.04612.650.99860.08814.440.993 20.04612.650.99970.09113.980.997 30.04712.390.99980.09513.580.996 40.04912.400.99890.09613.620.997 50.04912.280.991100.12712.480.996 110.04712.380.997120.09711.740.998  Linear model  Cluster 1: cases 1-5, 11  Low frequency  Independent of parameters  Cluster 2: cases 6-10, 12  High & mixed frequency  Dependent on parameters Low f  High f (for same λ)

21 Power Model Cont’d caseabR squared 10.04612.650.998 60.08814.440.993  Power dissipation as a function of the frequency level  Exclude offset because no difference in power offsets of cases 1,6  P(case6) / P(case1) = 0.088 / 0.046 = 1.913  Freq(case6) / Freq(case1) = 2.33 / 2.0 = 1.165 Low f  High f (for same λ) Case ID of pCPU Processor 1 Case ID of pCPU Processor 1 02460246 10,2,4,6LLLL6 HHHH

22 Unified Power Model – High Freq. caseabR squared cluster1 1-5,110.04712.550.998 cluster2 60.08914.410.993 70.09014.060.996 80.09713.570.996 90.09713.570.998 100.12712.490.996 120.09713.060.953  Additional required parameter  # of active CPUs

23 Extension to Multi-Processors Case # of vCPUs ID of pCPUs Processor 1Processor 2 02461357 1540,2LLLLLLLL 1640,1HHHHLLLL 1740,2HHHHLLLL 1840,1HHHHHHHH Caseestimatedmeasured a*a* b*b* ab 150.04924.800.04624.60 160.08824.760.08524.58 170.09525.860.09125.65 180.12724.960.12824.63  Estimated from power model  Measured from experimental results  Linear relationship between ‘pwr’ and ‘# of processors’ under the same utilization level  a * = average of the slopes  b * = sum of the offsets

24 Performance Model casecdeR squared 10.013404.50.0450.997 20.007290.00.0480.997 3,40.014199.40.0440.991 50.053109.00.0380.996 60.012407.50.0420.998 70.009294.00.0421.000 8,90.017204.90.0360.949 100.058111.10.0290.989  Queuing theory result  Average wait time:  Our performance model: Case # of vCPUs ID of pCPUs 140,2,4,6 240,2,4 340,2 440,4 540 640,2,4,6 740,2,4 840,2 940,4 1040

25 Performance Analysis  Observation 1: At the same normalized utilization level, response time of cases with smaller number of active CPUs is much higher than response times of cases with larger number of active CPUs (cases 5 & 10 vs. cases 1 & 6) Case # of vCPUs # of pCPUs freqCase # of vCPUs # of pCPUs freq 144Low644High 243Low743High 3,442Low8,942High 541Low1041High norm util = total util / (# of active CPUs)

26 Performance Analysis Cont’d. Case # of vCPUs ID of pCPUs Processor 1Processor 2 02461357 540LLLL 1310LLLL 1040HHHH 1410HHHH  Observation 2: When # of vCPUs is larger than # of pCPUs, response time increases (cf. case 5 vs. case 13, or case 10 vs. case 14)  # of vCPUs can be a parameter of the model

27 Pareto Surface Cost function Energy * resp. time / task Select best parameter One must choose the best parameters for different levels of load intensity or CPU utilization levels Either case 1 or 6 is the best one (cases w/o consolidation) CPU consolidation in a physical machine is not a good idea from a power savings perspective (but processor consolidation is effective)

28 Conclusion Accurate power/performance models for CPUs and servers in a virtualized computer system were derived through extensive simulations and hardware measurements These models can be used for doing a better power and performance tradeoff analysis or dynamic management Future work will include getting power/performance models for a more general virtualized system (multi-tiered service or multi- guest domains) using more advanced architectures (multi-core processor chips with hardware assists for virtualization)


Download ppt "Power and Performance Modeling in a Virtualized Server System M. Pedram and I. Hwang Department of Electrical Engineering Univ. of Southern California."

Similar presentations


Ads by Google