John Paul Enterprise Hosting Services Research and Development

“So What Do Those VMware Counters Really Mean and How Do They Help Me Identify My Problem?”
John Paul Enterprise Hosting Services Research and Development February 20, 2009

Acknowledgments and Presentation Goal
The material in this presentation was pulled from a variety of sources, much of which was graciously provided by VMware staff members. In particular I would like to thank Scott Drummonds for his “Interpreting ESXTop Statistics” document, Eric Heck (local VMware SE) for his input and review of the material, and the Performance Team at VMware that shared their insight and valuable information with us during our recent visit to Palo Alto. I also acknowledge and thank the VMware staff for their permission to use their material in this presentation. This presentation is intended to review the basics for performance analysis for the VI3.5 virtual infrastructure with detailed information on the tools and counters used. It presents a series of examples of how the performance counters report different types of resource consumption with a focus on key counters to observe.

Performance Analysis Basics
Agenda Performance Analysis Basics Types of Performance Counters Top Performance Counters Basic Performance Analysis Approach Native VMWare Tools – Where are We Looking? A Comparison of Esxtop and the Virtual Infrastructure Client A Quick Introduction to Esxtop A Quick Introduction to the Virtual Infrastructure Client Core Four Deep Dive - Performance Counters in Action Closing Thoughts Questions and Answers

Types of Performance Counters (a.k.a. statistics)
Performance Analysis Basics Types of Performance Counters (a.k.a. statistics) Static – Counters that don’t change during runtime, for example MEMSZ (memsize), Adapter queue depth, VM Name. The static counters are informational and may not be essential during performance problem analysis. Dynamic – Counters that are computed dynamically, for example CPU load average, memory over-commitment load average. Calculated - Some are calculated from the delta between two successive snapshots. Refresh interval (-d) determines the time between successive snapshots. For example %CPU used = ( CPU used time at snapshot 2 - CPU used time at snapshot 1 ) / time elapsed between snapshots

Top Performance Counters to Use for Initial Problem Determination
Performance Analysis Basics Top Performance Counters to Use for Initial Problem Determination ESX Host CPU (queuing) Average physical CPU utilization* Peak physical CPU utilization* CPU Time* Processor Queue Length Memory (swapping) Average Memory Usage Peak Memory Usage Page Faults Page Fault Delta* Disk (latency) Split IO/Sec Disk Read Queue Length Disk Write Queue Length Average Disk Sector Transfer Time Network (queuing/errors) Total Packets/second Bytes Received/second Bytes Sent/Second Output queue length Physical/Virtual Machine CPU (queuing) PCPU% %SYS %RDY Average physical CPU utilization Peak physical CPU utilization Physical CPU load average Memory (swapping) State (memory state) SWTGT (swap target) SWCUR (swap current) SWR/s (swap read/sec) SWW/s (swap write/sec) Consumed Active (working set) Swapused (instantaneous swap) Swapin (cumulative swap in) Swapout (cumulative swap out) VMmemctl (balloon memory) Disk (latency, queuing) DiskReadLatency DiskWriteLatency CMDS/s (commands/sec) Bytes transferred/received/sec Disk bus resets ABRTS/s (aborts/sec) SPLTCMD/s (I/O split cmds/sec) Network (queuing/errors) %DRPTX (packets dropped - TX) %DRPRX (packets dropped – RX) MbTX/s (mb transferred/sec – TX) MbRX/s (mb transferred/sec – RX) There are many performance counters within the various contexts. This chart indicates the key performance counters that we look at to identify performance or configuration issues. The immediate question to answer is “Is any process or task consuming an atypical amount of resources, based upon the application tier?” If nothing stands out then focus on the I/O subsystem, which is where many performance problems are found. The CPU clock interrupt is not being presented accurately to the Virtual Machine so any calculated counters based on timing can be inaccurate. Use the counters as guides only. The end user will not be prepared for that reality and some proactive education on Virtual Machine performance monitoring could substantially cut down on the number of support calls. * Remember that counters based upon the timer interrupt inside a VM can be inconsistent and should be used as general guides only.

A Review of the Basic Performance Analysis Approach
Performance Analysis Basics A Review of the Basic Performance Analysis Approach Identify the virtual context of the reported performance problem Where is the problem being seen? (“When I do this here, I get that”) How is the problem being quantified? (“My function is 25% slower) Apply a reasonability check (“Has something changed from the status quo?”) Monitor the performance from within that virtual context View the performance counters in the same context as the problem Look at the ESX cluster level performance counters Look for atypical behavior (“Is the amount of resources consumed characteristic of this particular application or task for the server processing tier?” ) Look for repeat offenders! This happens often. Expand the performance monitoring to each virtual context as needed Are other workloads influencing the virtual context of this particular application and causing a shortage of a particular resource? Consider how a shortage is instantiated for each of the Core Four resources

A Comparison of Esxtop and the Virtual Infrastructure Client
Where Are We Looking? A Comparison of Esxtop and the Virtual Infrastructure Client VIC gives a graphical view of both real-time and trend consumption VIC combines real-time reporting with short term (1 hour) trending VIC can report on the virtual machine, ESX host, or ESX cluster VIC is a bit awkward to get the needed counters in the same view Esxtop allows more concurrent performance counters to be shown Esxtop has a higher system overhead to run Esxtop can sample down to a 2 second sampling period Esxtop gives a detailed view of each of the Core Four Recommendation – Use VIC to get a general view of the system performance but use Esxtop for detailed problem analysis

A Brief Introduction to Esxtop
Where Are We Looking? A Brief Introduction to Esxtop Launched at the root level of the ESX host Fairly expensive in overhead, especially if the sampling rate increases Screens c: cpu (default) m: memory n: network d: disk adapter u: disk device (new in ESX 3.5) v: disk VM (new in ESX 3.5) Can be piped to a file and then imported in W2K System Monitor Horizontal and vertical screen resolution limits the number of fields and entities that could be viewed so chose your fields wisely Some of the rollups and counters may be confusing to the casual user

Using the Esxtop screen view
Where Are We Looking? Using the Esxtop screen view Time Uptime running worlds fields hidden from the view… Worlds = VMKernel processes (like W2k threads) ID = world identifier GID = world group identifier NWLD = number of worlds

Using the Esxtop screen view - expanding groups
Where Are We Looking? Using the Esxtop screen view - expanding groups press ‘e’ key In rolled up view some stats are cumulative of all the worlds in the group Expanded view gives breakdown per world VM group consists of mks (mouse, keyboard, screen), vcpu, vmx worlds. SMP VMs have additional vcpu and vmm worlds vmm0, vmm1 = Virtual machine monitors for vCPU0 and vCPU1 respectively

A Brief Introduction to the Virtual Infrastructure Client
Where Are We Looking? A Brief Introduction to the Virtual Infrastructure Client Screens – CPU, Disk, Management Agent, Memory, Network, System vCenter collects performance metrics from the hosts that it manages and aggregates the data using a consolidation algorithm. The algorithm is optimized to keep the database size constant over time. vCenter does not display many counters for trend/history screens Esxtop defaults to a 5 second sampling rate while vCenter defaults to a 20 second rate. Default statistics collection periods, samples, and how long they are stored: Interval Interval Period Number of Samples Interval Length Per Hour (real-time) 20 seconds 180 Per day 5 minutes 288 1 day Per week 30 minutes 336 1 week Per month 2 hours 360 1 month Per year 365 1 year

The Effect of Sampling Rates/Times on ESXTop and VIC Counters
Total CPU Usage = 14.18% ESXTop VIC Performance Tab VIC Summary Tab Total CPU Usage = 10.27% Total CPU Usage = 3.68% Total CPU = GHZ

Virtual Infrastructure Client – CPU Screen
Performance Counters in Action To Change Screens To Change Settings

Virtual Infrastructure Client – Disk Screen
Performance Counters in Action

Virtual Infrastructure Client – Change Settings Screen
Performance Counters in Action

A Comparison of Memory Counters in ESXTop and VIC on a 24GB Host
Total Memory Usage = 51.87% ESXTop VIC Performance Tab Total Memory Usage = 52.12% 12.51GB Memory Usage = 52.12% VIC Summary Tab

Resource Control Revisit – CPU Example
Performance Counters in Action Reservation (Guarantees) Minimum service level guarantee (in MHz) Even when system is overcommitted Needs to pass admission control for start-up Shares (Share the Resources) CPU entitlement is directly proportional to VM's shares and depends on the total number of shares issued Abstract number, only ratio matters Limit Absolute upper bound on CPU entitlement (in MHz) Even when system is not overcommitted Total MHZ Limit Shares Reservation 0 MHZ

Types of Resources – The Core Four
Performance Counters in Action Types of Resources – The Core Four Though the Core Four resources exist at both the ESX host and virtual machine levels, they are not the same in how they are instantiated and reported against. CPU – processor cycles (vertical), multi-processing (horizontal) Memory – allocation and sharing Disk (a.k.a. storage) – throughput, size, latencies, queuing Network - throughput, latencies, queuing Though all resources are limited, ESX handles the resources differently. CPU is more strictly scheduled, memory is adjusted and reclaimed (more fluid) if based on shares, disk and network are fixed bandwidth (except for queue depths) resources.

CPU – Understanding PCPU versus VCPU
Performance Counters in Action CPU – Understanding PCPU versus VCPU It is important to separate the physical CPU (PCPU) resources of the ESX host from the virtual CPU (VCPU) resources that are presented by ESX to the virtual machine. PCPU – The ESX host’s processor resources are exposed only to ESX. The virtual machines are not aware and cannot report on those physical resources. VCPU – ESX effectively assembles a virtual CPU(s) for each virtual machine from the physical machine’s processors/cores, based upon the type of resource allocation (ex. shares, guarantees, minimums). Scheduling - The virtual machine is scheduled to run inside the VCPU(s), with the virtual machine’s reporting mechanism (such as W2K’s System Monitor) reporting on the virtual machine’s allocated VCPU(s) and remaining Core Four resources.

Performance Counters in Action
PCPU and VCPU Example – Two Virtual Machines Performance Counters in Action Physical Host Physical Resources 2 socket, four = 16.8Ghz CPU 8 GB RAM Virtual Machine Logical Resources Each virtual machine defined as a uniprocessor VCPU = 2.1GHZ since uniprocessor Memory allocation of 1GB per virtual machine Two Virtual Machines Allocated Physical Resources 2 = 4.2Ghz CPU 2 GB RAM The idea of different contexts/levels of virtualization is key in the understanding of and correction of performance issues. The new administrator of the virtual infrastructure needs to inculcate the contexts of the virtualization, and be able to move quickly within the contexts in order to correctly analyze and correct any problems. Remaining Physical Resources 6 = 12.6Ghz CPU (minus virt. overhead) 6 GB RAM (minus virt. overhead) Limits, maximums, shares all affect real resources

CPU – Key Question and Considerations
Performance Counters in Action CPU – Key Question and Considerations Is there a lack of CPU resources for the VCPU(s) of the virtual machine or for the PCPU(s) of the ESX host? Allocation – The CPU allocation for a specific workload can be constrained due to the resource settings or number of CPUs, amount of shares, or limits. The key field at the virtual machine level is CPU queuing and at the ESX level it is Ready to Run (%RDY in Esxtop). Capacity - The virtual machine’s CPU can be constrained due to a lack of sufficient capacity at the ESX host level as evidenced by the PCPU/LCPU utilization. Contention – The specific workload may be constrained by the consumption of workloads operating outside of their typical patterns SMP CPU Skewing – The movement towards lazy scheduling of SMP CPUs can cause delays if one CPU gets too far “ahead” of the other. Look for higher %CSTP (co-schedule pending)

CPU Performance Counters 1
Performance Counters in Action CPU Performance Counters 1 Field Usage PCPU(%) Percentage of CPU utilization per physical CPU and total average physical CPU utilization. LCPU(%) Percentage of CPU utilization per logical CPU. The percentages for the logical CPUs belonging to a package add up to100 percent. This line appears only if hyper-threading is present and enabled. CCPU(%) Percentages of total CPU time as reported by the ESX Server service console. us— Percentage user time. sy— Percentage system time. id— Percentage idle time. wa— Percentage wait time. cs/sec— Context switches per second recorded by the service console ID Resource pool ID or virtual machine ID of the running world’s resource pool or virtual machine, or world ID of running world. GID Resource pool ID of the running world’s resource pool or virtual machine NAME Name of running world’s resource pool or virtual machine, or name of running world.

Performance Counters in Action CPU Performance Counters 2 Field Usage NWLD Number of members in running world’s resource pool or virtual machine. If a Group is expanded using the interactive command e(see interactive commands), then NWLD for all the resulting worlds is 1 (some resource pools like the console resource pool have only one member). %STATE TIMES Set of CPU statistics made up of the following percentages. For a world, the percentages are a percentage of one physical CPU. %USED Percentage physical CPU used by the resource pool, virtual machine, or world. This is the combination of all of the cores so the % can be greater then 100%. %RUN Percentage of total time scheduled. This time does not account for hyper threading and system time. On a hyper threading enabled server, the %RUN can be twice as large as %USED. %SYS Percentage of time spent in the ESX Server VMkernel on behalf of the resource pool, virtual machine, or world to process interrupts and to perform other system activities. This time is part of the time used to calculate %USED, above. %WAIT Percentage of time the resource pool, virtual machine, or world spent in the blocked or busy wait state. This percentage includes the percentage of time the resource pool, virtual machine, or world was idle.

Performance Counters in Action CPU Performance Counters 3 Field Usage %IDLE Percentage of time the resource pool, virtual machine, or world was idle. Subtract this percentage from %WAIT, above to see the percentage of time the resource pool, virtual machine, or world was waiting for some event. %RDY Percentage of time the resource pool, virtual machine, or world was ready to run %OVRLP Percentage of system time spent during scheduling of a resource pool, virtual machine, or world on behalf of a different resource pool, virtual machine, or world while the resource pool, virtual machine, or world was scheduled. This time is not included in %SYS. For example, if virtual machine A is currently being scheduled and a network packet for virtual machine B is processed by the ESX Server VMkernel, the time spent appears as %OVRLP for virtual machine A and %SYS for virtual machine B. %CSTP Percentage of time a resource pool spends in a ready, co-deschedule state %MLMTD Percentage of time the ESX Server VMkernel deliberately did not run the resource pool, virtual machine, or world because doing so would violate the resource pool, virtual machine, or worlds limit setting. Even though the resource pool, virtual machine, or world is ready to run when it is prevented from running in this way, the %MLMTD time is not included in %RDY time. EVENT Set of CPU statistics made up of per second event rates. These statistics are COUNTS/s for VMware internal use only.

Performance Counters in Action CPU Performance Counters 4 Field Usage CPU ALLOC Set of CPU statistics made up of the following CPU allocation configuration parameters. AMIN Resource pool, virtual machine, or world attribute Reservation AMAX Resource pool, virtual machine, or world attribute Limit. A value of ‐1 means unlimited. ASHRS Resource pool, virtual machine, or world attribute Shares STATS Parameters and statistics. These statistics are applicable only to worlds and not to virtual machines or resource pools. AFFINITYBIT Bit mask showing the current scheduling affinity for the world HTSHARING Current hyperthreading configuration CPU The physical or logical processor on which the world was running when resxtop(or esxtop) obtained this information. HTQ Indicates whether the world is currently quarantined or not. N means no and Y means yes. TIMER Timer rate for this world.

Esxtop CPU screen (c) PCPU = Physical CPU/core CCPU = Console CPU (CPU 0) Press ‘f’ key to choose fields

Idle State on Test Bed (CPU View) ESXTop Virtual Machine View

Idle State on Test Bed – GID 32 Expanded
Performance Counters in Action Wait includes idle Expanded GID Rolled Up GID Five Worlds Cumulative Wait % Total Idle %

High CPU within one virtual machine caused by affinity - Esxtop
Case Studies - CPU High CPU within one virtual machine caused by affinity - Esxtop One Virtual CPU is Fully Used Physical CPU Fully Used

High CPU within one virtual machine (affinity) - vCenter
Case Studies - CPU View of the ESX Host View of the VM

Add a Second Virtual Machine – MAX CPU (affinity)
Case Studies - CPU 2 Physical CPUs Fully Used 2 Virtual CPUs Fully Used Ready to Run Acceptable

Add a Third Virtual Machine – MAX CPU (affinity)

Add a Fourth Virtual Machine – MAX CPU (affinity)

Four Virtual Machines – MAX CPU (no affinity)

SMP Implementation WITHOUT CPU Constraints
Case Studies - CPU 4 Physical CPUs Fully Used One - 2 CPU SMP VCPU 4 Virtual CPUs Fully Used Ready to Run Acceptable

SMP Implementation WITHOUT CPU Constraints - VIC
Case Studies - CPU

SMP Implementation WITHOUT CPU Constraints - Esxtop
Case Studies - CPU 4 Physical CPUs Fully Used One - 2 CPU SMP VCPU 4 Virtual CPUs Fully Used Ready to Run Acceptable

SMP Implementation with Mild CPU Constraints
Case Studies - CPU 4 Physical CPUs Fully Used One - 2 CPU SMP VCPUs (7 NWLD) 4 Virtual CPUs Heavily Used Ready to Run Indicates Severe Problems

SMP Implementation with Severe CPU Constraints
Case Studies - CPU 4 Physical CPUs Fully Used Two - 2 CPU SMP VCPUs (7 NWLD) 4 Virtual CPUs Fully Used Ready to Run Indicates Severe Problems

SMP Implementation with Severe CPU Constraints
Case Studies - CPU

Memory – Separating the machine and guest memory
Case Studies - Memory Memory – Separating the machine and guest memory It is important to note that some statistics refer to guest physical memory while others refer to machine memory. “ Guest physical memory" is the virtual-hardware physical memory presented to the VM. " Machine memory" is actual physical RAM in the ESX host. In the figure below, two VMs are running on an ESX host, where each block represents 4 KB of memory and each color represents a different set of data on a block. Inside each VM, the guest OS maps the virtual memory to its physical memory. ESX Kernel maps the guest physical memory to machine memory. Due to ESX Page Sharing technology, guest physical pages with the same content can be mapped to the same machine page.

A Brief Look at Ballooning
Case Studies - Memory A Brief Look at Ballooning The W2K balloon driver is located in VMtools ESX sets a balloon target for each workload at start-up and as workloads are introduced/removed The balloon driver expands memory consumption, requiring the Virtual Machine operating system to reclaim memory based on its algorithms Ballooning routinely takes minutes to reach the target The returned memory is now available for ESX to use Key ballooning fields: SZTGT: determined by reservation, limit and memory shares SWCUR = 0 : no swapping in the past SWTGT = 0 : no swapping pressure SWR/S, SWR/W = 0 : No swapping activity currently

ESX Memory Sharing - The “Water Bed Effect”
Case Studies - Memory ESX Memory Sharing - The “Water Bed Effect” ESX handles memory shares on an ESX host and across an ESX cluster with a result similar to a single water bed, or room full of water beds, depending upon the action and the memory allocation type: Initial ESX boot (i.e., “lying down on the water bed”) – ESX sets a target working size for each virtual machine, based upon the memory allocations or shares, and uses ballooning to pare back the initial allocations until those targets are reached (if possible). Steady State (i.e., “minor position changes”) - The host gets into a steady state with small adjustments made to memory allocation targets. Memory “ripples” occur during steady state, with the amplitude dependent upon the workload characteristics and consumption by the virtual machines. New Event (i.e., “second person on the bed”) – The host receives additional workload via a newly started virtual machine or VMotion moves a virtual machine to the host through a manual step, maintenance mode, or DRS. ESX pares back the target working size of that virtual machine while the other virtual machines lose CPU cycles that are directed to the new workload. Large Event (i.e., “jumping across water beds”) – The cluster has a major event that causes a substantial movement of workloads to or between multiple hosts. Each of the hosts has to reach a steady state, or to have DRS determine that the workload is not a current candidate for the existing host, moving to another host that has reached a steady state with available capacity. Maintenance mode is another major event.

Memory – Key Question and Considerations
Case Studies - Memory Memory – Key Question and Considerations Is the memory allocation for each workload optimum to prevent swapping at the Virtual Machine level, yet low enough not to constrain other workloads or the ESX host? HA/DRS/Maintenance Mode Regularity – How often do the workloads in the cluster get moved between hosts? Each movement causes an impact on the receiving (negative) and sending (positive) hosts with maintenance mode causing a rolling wave of impact across the cluster, depending upon the timing. Allocation Type – Each of the allocation types have their drawbacks so tread carefully when choosing the allocation type. One size seldom is right for all needs. Capacity/Swapping - The virtual machine’s CPU can be constrained due to a lack of sufficient capacity at the ESX host level. Look for regular swapping at the ESX host level as an indicator of a memory capacity issue but be sure to notice memory leaks that artificially force a memory shortage situation.

Memory Performance Counters 1
Case Studies - Memory Field Usage PMEM (MB) Displays the machine memory statistics for the server. All numbers are in megabytes. total — Total amount of machine memory in the server. cos — Amount of machine memory allocated to the ESX Server service console (ESX Server 3 only). vmk — Amount of machine memory being used by the ESX Server VMkernel. other — Amount of machine memory being used by everything other than the ESX service console (ESX Server 3 only) and ESX Server VMkernel. free — Amount of machine memory that is free. state — Current machine memory availability state. Possible values are high, soft, hard and low. High means that the machine memory is not under any pressure and low means that it is. VMKMEM Displays the machine memory statistics for the ESX Server VMkernel. All (MB) numbers are in megabytes. . managed — Total amount of machine memory managed by the ESX Server VMkernel. . min free — Minimum amount of machine memory that the ESX Server VMkernel aims to keep free. . rsvd — Total amount of machine memory currently reserved by resource pools. . ursvd — Total amount of machine memory currently unreserved

Case Studies - Memory Field Usage COSMEM Displays the memory statistics as reported by the ESX Server service console (MB) (ESX Server 3 only). All numbers are in megabytes. free — Amount of idle memory. swap_t — Total swap configured. swap_f — Amount of swap free. r/s is — Rate at which memory is swapped in from disk. w/s — Rate at which memory is swapped to disk. NUMA (MB) Displays the ESX Server NUMA statistics. This line appears only if the ESX Server host is running on a NUMA server. All numbers are in megabytes. For each NUMA node in the server, two statistics are displayed The total amount of machine memory in the NUMA node that is managed by the ESX Server. The amount of machine memory in the node that is currently free (in parentheses). PSHARE (MB) Displays the ESX Server page‐sharing statistics. All numbers are in megabytes shared — Amount of physical memory that is being shared. common — Amount of machine memory that is common across worlds. saving — Amount of machine memory that is saved because of page sharing.

Case Studies - Memory Field Usage SWAP (MB) Displays the ESX Server swap usage statistics. All numbers are in megabytes curr — Current swap usage target — Where the ESX Server system expects the swap usage to be. r/s — Rate at which memory is swapped in by the ESX Server system from disk. w/s — Rate at which memory is swapped to disk by the ESX Server system MEMCTL (MB) Displays the memory balloon statistics. All numbers are in megabytes. curr — Total amount of physical memory reclaimed using the vmmemctl module target — Total amount of physical memory the ESX Server host attempts to reclaim using the vmmemctl module. max — Maximum amount of physical memory the ESX Server host can reclaim using the vmmemctlmodule AMIN Memory reservation for this resource pool or virtual machine AMAX Memory limit for this resource pool or virtual machine. A value of 01 means Unlimited ASHRS Memory shares for this resource pool or virtual machine AMLMT Memory limit for this resource pool or virtual machine

Case Studies - Memory Field Usage AUNITS Type of units shown (kb) N%L Current percentage of memory allocated to the virtual machine or resource pool that is local MEMSZ (MB) Amount of physical memory allocated to a resource pool or virtual machine SZTGT (MB) Amount of machine memory the ESX Server VMkernel wants to allocate to a resource pool or virtual machine TCHD (MB) Working set estimate for the resource pool or virtual machine %ACTV Percentage of guest physical memory that is being referenced by the guest. This is an instantaneous value %ACTVS Percentage of guest physical memory that is being referenced by the guest. This is a slow moving average. %ACTVF Percentage of guest physical memory that is being referenced by the guest. This is a fast moving average %ACTVN Percentage of guest physical memory that is being referenced by the guest. This is an estimation MCTL Memory balloon driver is installed or not. N means no, Y means yes. MCTLSZ Amount of physical memory reclaimed from the resource pool by way of (MB) ballooning MCTLTGT Amount of physical memory the ESX Server system can reclaim from the (MB) resource pool or virtual machine by way of ballooning

Case Studies - Memory Field Usage MCTLMAX Maximum amount of physical memory the ESX Server system can reclaim (MB) from the resource pool or virtual machine by way of ballooning. This maximum depends on the guest operating system type SWCUR (MB) Current swap usage by this resource pool or virtual machine SWTGT (MB) Target where the ESX Server host expects the swap usage by the resource pool or virtual machine to be SWR/s (MB) Rate at which the ESX Server host swaps in memory from disk for the resource pool or virtual machine SWW/s (MB) Rate at which the ESX Server host swaps resource pool or virtual machine memory to disk. CPTRD (MB) Amount of data read from checkpoint file CPTTGT Size of checkpoint file. (MB) ZERO (MB) Resource pool or virtual machine physical pages that are zeroed SHRD (MB) Resource pool or virtual machine physical pages that are shared. SHRDSVD Machine pages that are saved because of resource pool or virtual machine (MB) shared pages. OVHD (MB) Current space overhead for resource pool.

Case Studies - Memory Field Usage OVHDMAX Maximum space overhead that might be incurred by resource pool or virtual (MB) machine OVHDUW Current space overhead for a user world RESP? Memory Responsive? NHN Current home node for the resource pool or virtual machine. This statistic is applicable only on NUMA systems. If the virtual machine has no home node, a dash (‐) is displayed NRMEM Current amount of remote memory allocated to the virtual machine or (MB) resource pool. This statistic is applicable only on NUMA systems GST_NDx Guest memory allocated for a resource pool on NUMA node x. This statistic (MB) is applicable on NUMA systems only OVD_NDx VMM overhead memory allocatedforaresource pool on NUMA node x. This (MB) statistic is applicable on NUMA systems only

Esxtop memory screen (m)
Case Studies - Memory Esxtop memory screen (m) Possible states: High, Soft, hard and low Physical Memory (PMEM) VMKMEM COS PCI Hole VMKMEM - Memory managed by VMKernel COSMEM - Memory used by Service Console

Esxtop memory screen (m)
Case Studies - Memory Esxtop memory screen (m) Swapping activity in Service Console VMKernel Swapping activity SZTGT : determined by reservation, limit and memory shares SWCUR = 0 : no swapping in the past SWTGT = 0 : no swapping pressure SWR/S, SWR/W = 0 : No swapping activity currently SZTGT = Size target SWTGT = Swap target SWCUR = Currently swapped MEMCTL = Balloon driver SWR/S = Swap read /sec SWW/S = Swap write /sec

VIC Memory Screen – Summary Tab
Case Studies - Memory VIC Memory Screen – Summary Tab Requested Memory: The amount of physical memory (in MB) requested for a guest Host Memory Usage: The amount of physical memory (in MB) allocated to a guest Guest Memory Usage: The amount of physical memory (in MB) actively used by a guest

ESXTop and VIC Summary Tab – Memory Screens
Virtual Machine Requested Memory Virtual Machine Target Working Set Size Virtual Machine Memory Actively Used Virtual Machines Have Previously Swapped

Idle State on Test Bed – Memory View
Case Studies - Memory

Memory View at Steady State of 3 Virtual Machines – Memory Shares
Case Studies - Memory Memory View at Steady State of 3 Virtual Machines – Memory Shares Most memory is not reserved Virtual Machine Just Powered On These VMs are at a memory steady state No VM Swapping or Targets

Ballooning and Swapping in Progress – Memory View
Case Studies - Memory Ballooning and Swapping in Progress – Memory View Possible states: High, Soft, hard and low Different Size Targets Due to Different Amount of Up Time Ballooning In Effect Mild swapping

Memory Reservations – Effect on New Loads
Case Studies - Memory Memory Reservations – Effect on New Loads What Size Virtual Machine with Reserved Memory Can Be Started? 6GB of “free” physical memory due to memory sharing over 20 minutes 666MB of unreserved memory Three VMs, each with 2GB reserved memory Can’t start fourth virtual machine of >512MB of reserved memory Fourth virtual machine of 512MB of reserved memory started

Memory Shares – Effect on New Loads
Case Studies - Memory Memory Shares – Effect on New Loads Three VMs with 2GB allocation 5.9 GB of “free” physical memory 6GB of unreserved memory Fourth virtual machine of 2GB of memory allocation started successfully

Storage – Key Question and Considerations
Storage Considerations Storage – Key Question and Considerations Is the bandwidth and configuration of the storage subsystem sufficient to meet the desired latency for the target workloads? If the latency target is not being met then further analysis may be very time consuming. Queuing - Queuing can happen at any point along the storage path, but is not necessarily a bad thing if the latency meets requirements. Storage Path Configuration and Capacity – It is critical to know the configuration of the storage path and the capacity of each node along that path. The number of active vmkernel commands must be less then or equal to the queue depth max of any of the storage path components while processing the target storage workload. Virtual Machines per LUN - The number of outstanding active vmkernel commands per virtual machine times the number of virtual machines on a specific LUN must be less then the queue depth of that adaptor The Latency vs. Throughput Relationship - Throughput (MB/sec) = (Outstanding IOs/ latency (msec)) * Block size (KB)

Storage – More Questions
Storage Considerations Storage – More Questions How fast can the individual disk drive process a request? Based upon the block-size and type of I/O (sequential read, sequential write, random read, random write) what type of configuration (RAID, number of physical spindles, cache) is required to match the I/O characteristics and workload demands for average and peak throughput? Does the network storage (SAN frame) handle the I/O rate down each path and aggregated across the internal bus, frame adaptors, and front end processors? In order to answer these questions we need to better understand the underlying design, considerations, and basics of the storage subsystem

Back-end Storage Design Considerations
Storage Considerations Back-end Storage Design Considerations Capacity - What is the storage capacity needed for this workload/cluster? Disk drive size (144GB, 300GB) Number of disk drives needed within a single logical unit (ex., LUN) IOPS Rate – How many I/Os per second are expected with the needed latency? Number of physical spindles per LUN Impact of sharing of physical disk drives between LUNs Configuration (ex., cache) and speed of the disk drive Availability – How many disk drives, storage components can fail at one time? Type of RAID chosen Amount of redundancy built into the storage solution Cost – Delivered cost per byte at the required speed and availability Many options are available for each design consideration Final decisions on the choice for each component The cumulative amount of capacity, IOPS rate, and availability often dictate the overall solution

Storage from the Ground Up – Physical Disk Types
Storage Considerations Storage from the Ground Up – Physical Disk Types SCSI (Small Computer Systems Interface) Fiber Channel (FC) Allows queuing depth of 16 S-ATA (Serial Advanced Technology Attachment) No command queuing, lower costs Especially good for sequential I/Os (ex, backup) SAS (Serial SCSI) Solid State Disk (SSD)

Storage from the Ground Up – Configuration Options
Storage Considerations Storage from the Ground Up – Configuration Options Latency – The average time it takes for the requested sector to rotate under the read/write head after a completed seek 5400 (5.5ms), 7200 (4.2ms), 10,000 (3ms) , 15,000 (2ms) RPM Ave. disk latency = 1/2 * rotation Seek Time – The time it takes for the read/write head to find the physical location of the requested data on the disk Average Seek time: 8-10 ms Average Access Time – The time it takes for a disk device to service a request. This includes seek time, latency, and command processing overhead time. Host Transfer Rate – The speed at which the host can transfer the data across the disk interface.

The Effect of Rotational Speed on Transfer Time
Storage Considerations The Effect of Rotational Speed on Transfer Time 15,000 RPM disk Average Seek 3.6mSec (1/3 Stroke) Average latency 2.0mSec Command and data transfer 0.2mSec Total random access time = = 5.8mSec 1 / sec = 172 I/Os per second per disk At 8KB per I/O that is about 1.4 MB/Second 10,000 RPM disk Average Seek 4.7mSec (1/3 Stroke) Average latency 3.0 mSec Command and data transfer < 1mSec Total random access time = = 8.1mSec 1 / = 123 I/0s per second per disk At 8KB per I/O that is about 984 KB/Sec

An Example of the Benefits of Faster Drives
Drive Speed Drive Capacity # of Drives RAID 5 Capacity (GB) Performance (IOPS) Short-Stroked to (GB) 10K RPM 146GB 17 2482 700 No 15K RPM 11 1606 73GB 1241 300GB 16 2064 763 Yes 14 2044 772

Interaction of Transfer Rate, Block Size, and IOPS - Read
XFR Rate IOPS Block Size Block size increases, IOPS decrease, throughput increase

Storage from the Ground Up – Common RAID Types/Uses
Storage Considerations RAID 0 - Stripes data across two or more disks with no parity Poor redundancy, good performance RAID 1 – Data is duplicated across disks (mirrored) Good redundancy, good performance on reads, reduced capacity/price RAID 5 - Data blocks are striped across disks with parity data distributed across the physical disks Good redundancy, performance penalty on writes, high capacity/price RAID 6 – Data blocks are striped across disks with two copies of parity data distributed across the physical disks

Example: Creating Four 150GB LUNs Using RAID 1 + 0
Storage Considerations Example: Creating Four 150GB LUNs Using RAID 1 + 0 Eight total physical disks, each physical disk has 150GB available space Effective raw disk capacity of 600GB (4 x 150GB) Each LUN spans four physical disk drives for performance needs Each physical disk has four LUNs allocated

Example: Creating Four 150GB LUNs Using RAID 5
Storage Considerations Example: Creating Four 150GB LUNs Using RAID 5 Four total physical disks, each physical disk has 150GB available space Effective raw disk capacity of 600GB (4 x 150GB) Each LUN spans three physical disk drives for performance needs Each physical disk has four LUNs allocated

The Impact of LUN Data Striping on Performance
Storage Considerations The Impact of LUN Data Striping on Performance Each workload independently makes requests of the physical disks Higher disk utilization usually results in slower response time The faster the physical disk response, the less negative impact on response time High usage of a single disk results in hot spots, with resultant data moves by the storage subsystem to different physical disks “Short stroking” (using outer disk part) of disks may be needed for high I/O

Network Storage Components That Can Affect Performance/Availability
Storage Considerations Network Storage Components That Can Affect Performance/Availability Size and use of cache Number of independent internal data paths and buses Number of front-end interfaces and processors Types of interfaces supported (ex. Fiber channel and iSCSI) Number and type of physical disk drives available MetaLUN Expansion MetaLUNs allow for the aggregation of LUNs System typically re-stripes data when MetaLUN is changed Some performance degradation during re-striping Storage Virtualization Aggregation of storage arrays behind a presented mount point/LUN Movements between disk drives and tiers control by storage management Change of physical drives and configuration may be transient and severe

SAN Storage Infrastructure – Areas to Watch/Consider
Storage Considerations SAN Storage Infrastructure – Areas to Watch/Consider HBA Speed Fiber Bandwidth Disk Response FA CPU Speed RAID Configuration Number of Spindles in LUN/array Block Size Disk Speeds Let’s take a look at some of the components of the overall SAN Storage Infrastructure. Each of these components is a potential candidate for performance degradation and as such needs to be carefully designed AND tested. Do not assume that the SAN Storage Infrastructure is a black box. Some things to look for: Speed of the front end CPU of the SAN RAID configuration of the back-end storage devices Distribution of the load across multiple frame adaptors Internal paths inside the storage cabinet World Queue Length LUN Queue Length Storage Adapter Queue Length Cache Size/Type

Storage Queuing – The Key Throttle Points
Storage Considerations Storage Queuing – The Key Throttle Points Execution Throttle LUN Queue Length (LQLEN) Execution Throttle

Storage I/O – The Key Throttle Point Definitions
Storage Considerations Storage I/O – The Key Throttle Point Definitions Storage Adapter Queue Length (AQLEN) The number of outstanding vmkernel active commands that the adapter is configured to support. This is not settable. It is a parameter passed from the adapter to the kernel. LUN Queue Length (LQLEN) The maximum number of permitted outstanding vmkernel active commands to a LUN. (This would be the HBA queue depth setting for an HBA.) This is set in the storage adapter configuration via the command line. World Queue Length (WQLEN) VMware Recommends Not to Change This!!!! The maximum number of permitted outstanding vmkernel active requests to a LUN from any singular virtual machine (min:1, max:256: default: 32) Configuration->Advanced Settings->Disk-> Disk.SchedNumReqOutstanding Execution Throttle (this is not a displayed counter) The maximum number of permitted outstanding vmkernel active commands that can be executed on any one HBA port (min:1, max:256: default: ~16, depending on vendor) This is set in the HBA driver configuration.

Queue Length Rules of Thumb
Storage Considerations Queue Length Rules of Thumb For a lightly-loaded system, average queue length should be less than 1 per spindle with occasional spikes up to 10. If the workload is write-heavy, the average queue length above a mirrored controller should be less than 0.6 per spindle and less than 0.3 per spindle above a RAID-5 controller. For a heavily-loaded system that isn’t saturated, average queue length should be less than 2.5 per spindle with infrequent spikes up to 20. If the workload is write-heavy, the average queue length above a mirrored controller should be less than 1.5 per spindle and less than 1 above a RAID-5 controller.

Storage Performance Counters 1
Case Studies - Storage Field Usage %USD Percentage of the queue depth used by ESX Server VMkernel active commands. This statistic is applicable only to worlds and devices. ABRTS/s Number of commands aborted per second. ACTV Number of commands in the ESX Server VMkernel that are currently active. This statistic is applicable only to worlds and devices AQLEN Storage adapter queue depth. Maximum number of ESX Server VMkernel active commands that the adapter driver is configured to support. BLKSZ Block size in bytes. CID Storage adapter channel/controller ID. This ID is visible only if the corresponding adapter, channel and target are expanded. CMDS/s Number of commands issued per second. DAVG/cmd Average device latency per command in milliseconds. DAVG/rd Average device read latency per read operation in milliseconds. DAVG/wr Average device write latency per write operation in milliseconds. Device Storage device name. This name is visible only if corresponding world is expanded to devices. DQLEN Storage device queue depth. This is the maximum number of ESX Server VMkernel active commands that the device is configured to support.

Case Studies - Storage Field Usage GAVG/cmd Average guest operating system latency per command in milliseconds. GAVG/rd Average guest operating system read latency per read operation in milliseconds GAVG/wr Average guest operating system write latency per write operation in milliseconds GID Resource pool ID of running world’s resource pool. ID Resource pool ID of the running world’s resource pool or the world ID of the running world. KAVG/cmd Average ESX Server VMkernel latency per command in milliseconds. KAVG/rd Average ESX Server VMkernel read latency per read operation in milliseconds KAVG/wr Average ESX Server VMkernel write latency per write operation in milliseconds LID Storage adapter channel target LUN ID. This ID is visible only if the corresponding adapter, channel and target are expanded. LOAD Ratio of ESX Server VMkernel active commands plus ESX Server VMkernel queued commands to queue depth. This statistic is applicable only to worlds and devices. LQLEN LUN queue depth. Maximum number of ESX Server VMkernel active commands that the LUN is allowed to have MBREAD/s Megabytes read per second MBWRTN/s Megabytes written per second. NAME Name of running world’s resource pool or name of the running world.

Case Studies - Storage Storage Performance Counters 3 Field Usage NCHNS Number of channels. NDV Number of devices NLUNS Number of LUNs NPH Number of paths/partitions NTGTS Number of targets NUMBLKS Number of blocks of the device. It is valid only if the corresponding world is expanded to devices. NVMS Number of worlds NWD Number of worlds. PAECMD/s The number of PAE (Physical Address Extension) commands per second. PAECP/s Number of PAE copies per second. This statistic is applicable only to paths. PARTITION Partition ID This ID is visible only if the corresponding device is expanded to partitions. PATH Path name This name is visible only if the corresponding device is expanded to paths. QAVG/cmd Average queue latency per command in milliseconds. QAVG/rd Average queue latency per read operation, in milliseconds QAVG/wr Average queue latency per write operation, in milliseconds.

Case Studies - Storage Field Usage QUED Number of commands in the ESX Server VMkernel that are currently queued READS/s Number of read commands issued per second. RESETS/s Number of commands reset per second. SHARES Number of shares SPLTCMD/s The number of split commands per second. SPLTCP/s The number of split copies per second. WID Storage adapter channel target LUN world ID. This ID is visible only if the corresponding adapter, channel, target and LUN are expanded. WQLEN World queue depth. Maximum number of ESX Server VMkernel active commands that the world is allowed to have. This is a per LUN maximum for the world World queue depth. This is the maximum number of ESX Server VMkernel active commands that the world is allowed to have. This is a per device maximum for the world. It is valid only if the corresponding device is expanded to worlds. WRITES/s Number of write commands issued per second

Esxtop disk adapter screen (d)
Case Studies - Storage Esxtop disk adapter screen (d) Host bus adapters (HBAs) - includes SCSI, iSCSI,RAID, and FC-HBA adapters Latency stats from the Device, Kernel and the Guest DAVG/cmd - Average latency (ms) from the Device (LUN) KAVG/cmd - Average latency (ms) in the VMKernel GAVG/cmd - Average latency (ms) in the Guest

Esxtop disk device screen (u)
Case Studies - Storage Esxtop disk device screen (u) LUNs in C:T:L format (Controller: Target: LUN)

Esxtop disk VM screen (v)
Case Studies - Storage Esxtop disk VM screen (v) running VMs

Test Bed Storage Configuration – Device Allocation
Case Studies - Storage For performance analysis purposes, an IO request from an application in a virtual machine traverses through multiple levels of queues, each associated with a resource, in the guest OS, the VMkernel and the physical storage. (Note that physical storage could be an FC- or IP- SAN or disk array.) Each queue has an associated latency, dictated by its size and whether the IO load is low or high, which affects the throughput and latency seen by applications inside VMs.

Test Bed Storage Configuration – vmhba0
Case Studies - Storage

Test Bed Storage Configuration – vmhba1
Case Studies - Storage

Test Bed Idle State– Device Adapter View
Case Studies - Storage Test Bed Idle State– Device Adapter View Average Device Latency, Per Command Storage Adapter Maximum Queue Length World Maximum Queue Length LUN Maximum Queue Length

Moderate load on two virtual machines
Case Studies - Storage Acceptable latency from the disk subsystem Commands are queued BUT….

Heavier load on two virtual machines
Case Studies - Storage Virtual machine latency is consistently above 20ms/second, performance could start to be an issue Commands are queued and are exceeding maximum queue lengths BUT….

Heavy load on four virtual machines
Case Studies - Storage Heavy load on four virtual machines Virtual machine latency is consistently above 60 ms/second for some VMs, performance will be an issue Commands are queued and are exceeding maximum queue lengths AND….

Artificial Constraints on Storage
Case Studies - Storage Artificial Constraints on Storage Good throughput Problem with the disk subsystem Low device Latency Bad throughput Device Latency is high - cache disabled

Network – Key Question and Considerations
Case Studies - Network Network – Key Question and Considerations Is the bandwidth and configuration of the network sufficient to meet the desired throughput without network errors? Configuration – Is the network configuration known and is the network correctly designed for the needed throughput? Error Indicators – Dropped packets and low throughput without errors indicates a problem. Network Counters – These are fairly high level counters. Most network analysis would be done external to the system.

Network Performance Counters 1
Case Studies - Network Field Usage PORT Virtual network device port ID UPLINK Y means the corresponding port is an uplink. N means it is not. UP Y means the corresponding like is up. N means it is not. SPEED Link speed in Megabits per second. FDUPLX Y means the corresponding link is operating in full duplex. N means it is not. USED Virtual network device port user. DTYP Virtual network device type. H means hub and S means switch. PKTTx/s Number of packets transmitted per second. PKTRX/s Number of packets received per second. MbTX/s MegaBits transmitted per second. MbRX/s MegaBits received per second %DRPTX Percentage of transmit packets dropped %DRPRX Percentage of receive packets dropped

Esxtop network screen (n)
Case Studies - Network Esxtop network screen (n) Service console NIC Physical NIC PKTTX/s - Packets transmitted /sec PKTRX/s - Packets received /sec MbTx/s - Transmit Throughput in Mbits/sec MbRx/s - Receive throughput in Mbits/sec Virtual NICs Port ID: every entity is attached to a port on the virtual switch DNAME - switch where the port belongs to

Test Bed Idle State – Network View
Case Studies - Network

Test Bed Under Heavy Network Load
Case Studies - Network Heavy transmit and receive loads No packets are being dropped despite the load BUT….

Closing Thoughts Know the key counters to look at for each type of resource Be careful on what type of resource allocation technique you use for CPU and RAM. One size may NOT fit all. Consider the impact of events such as maintenance on the performance of a cluster Set up a simple test bed where you can create simple loads to become familiar with the various performance counters and tools Compare your test bed analysis and performance counters with the development and production clusters Know your storage subsystem components and configuration due to the large impact this can have on overall performance Take the time to learn how the various components of the virtual infrastructure work together

Performance Resources
The performance community Performance analysis techniques whitepaper Performance web page for white papers VROOM!—VMware performance blog

John Paul – johnathan.paul@siemens.com
Questions?

John Paul Enterprise Hosting Services Research and Development

Similar presentations

Presentation on theme: "John Paul Enterprise Hosting Services Research and Development"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

John Paul Enterprise Hosting Services Research and Development

Similar presentations

Presentation on theme: "John Paul Enterprise Hosting Services Research and Development"— Presentation transcript:

Similar presentations

About project

Feedback