© 2010 VMware Inc. All rights reserved Enterprise Management.

2 Why Automated Operations, now more than ever?

3 What We Are Hearing From You Configuration & Compliance Performance & Capacity “I need a more integrated, simpler approach to ensure the performance, capacity, and health of our virtual environment” “In the past we’ve just over-provisioned as that was the safest way to CYA. But now management is asking for usage reports and capacity plans before they allow us to buy more infrastructure.” “We are constantly preparing for or responding to an audit. We basically shut down normal IT Ops, other than emergencies, during each quarterly audit period.” “We don’t really have good visibility with our servers – we don’t know what patches have been applied or when. It would be great to know what percentage we are patched each week.”

4 Managing Performance/Capacity in vSphere: the basic Is it healthy? Every VM & ESX performing well? CPU, RAM, Network, Disk? Are they behaving expectedly? Any fault on any component? Every VM & ESX performing well? CPU, RAM, Network, Disk? Are they behaving expectedly? Any fault on any component? Is it enough? Enough CPU, RAM, Network, Disk? Future risk? Time remaining? Capacity remaining? Where are the “Stress points” in time? Enough CPU, RAM, Network, Disk? Future risk? Time remaining? Capacity remaining? Where are the “Stress points” in time? Is it optimised? Which VMs need adjustment? What are my key ratios? How much can I claim back from “fat” VMs? How many more VMs can I put without impacting performance? Which VMs need adjustment? What are my key ratios? How much can I claim back from “fat” VMs? How many more VMs can I put without impacting performance?

5 Deep understanding of vCenter is required Yes, buy more RAM. ESXi has 32 GB RAM. It is highly used Yes, buy more RAM. ESXi has 32 GB RAM. It is highly used

6 VMware’s Approach to Automated Operations

7 Purpose Built Capacity Planning & Analysis Integrated capacity analysis and forecasting Decision support & automation via views, alerts, reports VM right sizing and capacity reclamation Automated Configuration & Compliance Automated Patching and Provisioning Comprehensive change tracking to isolate root cause Single-click rollback to remediate and return to normal Patented Performance Analytics Self-learning of “normal” performance conditions Service health baseline and trending Smart alerts of impending performance degradation vCenter Operations Solution- Bringing together 3 Disciplines

8 Threshold: a shift in mindset needed  vCenter sets “static” threshold, which can be misleading During peak, it is common for VM to reach high utilisation. Static threshold will generate alerts when they should not. vSphere admin quickly learns to ignore them, defeating the purpose of alert to begin with. During non-peak, it might be abnormal for VM to reach even 50% utilisation. Static threshold will not generate alerts when they should have.  vCenter only sets high threshold Do you set static threshold when CPU or RAM utilisation drops below 5%? A drop in entire array storage IOPS might be a sign of terrible day ahead. Will not alert when these happen: Utilisation drops from 75% to 1% when it should not. Utilisation change from 5% to 70% when it should not. We need to plots both upper range and lower range  But each VM differs. And the same VM differs depending on day/time…  Intelligence required to analyse each metrics and their expected “normal” behaviour.

9 Slide 9 How vCenter Operations Delivers a New Model for Operations  Tightly integrated with vSphere  Self-learns “normal” conditions using patented analytics  Aggregates underlying metrics into Workload, Capacity, Health scores  Powerful visibility and drill down from datacenter to component level  Smart alerts of impending performance degradation  Integration of 3 rd party monitoring tools An integrated approach and patented analytics to transform how IT ensures service levels in dynamic environments !

10 Technical Deep Dive Presentation Back

11 vCenter Operations Management Suite 5.0  VMware vCenter Operations Manager Key part of the VMware vCenter Operations Management Suite  CapacityIQ Merged with vC Ops CIQ gets VCOPs features  Dashboard New Badges (11 – Up from 3) Improved Details Page  Greater Emphasis on the Datastore (First Class Object) Performance Management and Capacity Management  New Integrations VCM  vC Ops vC Ops  Chargeback vCD  vC Ops

12 vSphere vCenter Operations Mgr. – High Level Architecture OpenVPN Postgres DB vSphere WebApp Custom WebApp Admin WebApp vCenter Operations Manager vApp UI VM Rolled up capacity data Capacity Analytics FSDB Postgres DB Collector ActiveMQ Performance Analytics Analytics VM Metric Data vSphere VMware Cloud / vCenter vSphere vC Ops Mgr vSphere UI vCenter Configuration Manager 3 rd Party Data Sources vCenter Communications over SSL vC Ops Mgr Custom UI

13 Brand new UI in vCOps5  Updates to the 1.0 Skittles View Operations Badges Relationship to the Datastore Left Pane Navigation Drives Focus (e.g. Datastore) Left Pane Navigation Drives Focus (e.g. Datastore) New World Object Multi vCenter Support

14 Dashboards & Badges

15 Confidential Vc Ops vSphere UI – Unified Dashboard  Launching Pad Click to Drill down  Focused on problems Click to drill into details! Almost everything is clickable  Main Themes Health Risk Efficiency  New Concepts Faults Weekly Stress Profile Reclaimable Waste Density

16 vC Ops vSphere UI – Two Different Users Immediate problems What is happening right now? What do I need to pay attention to? Operations Short and Long Term Capacity Forward Looking Are there areas that I should be concerned about from a capacity perspective? Have I deployed my VI in the most efficient manner?

17 vC Ops Default UI – Major and Minor Badges High level Understanding Calculated from scores of Minor Badges Major x 3 Minor x 8 Specifics Guidance

18 Operations: Major Badge – Health  “How is this object doing right now?" Identifies current problems in the system Issues that need to be resolved immediately to avoid problems  High Health is good (100-0)  Heatmap Provides quick view of many objects at once Shows Health of all parent and child objects Go back in time (6 hours) and see the “weather” of the Virt Infrastructure  Health Score is calculated from its Minor Badges Workload Anomalies Faults

19 Operations: Health Minor Badge – Workload  Measures how hard an object is working?  High Workload is bad (0-100 or more!) Percentage of Demand divided by effective capacity As workload approaches (and exceeds) 100%  Performance Problems!  Starving object for resources!  Focused attention CPU Memory Disk I/O Network I/O 95  Improved Network and Disk I/O calculations  Eliminates idle networks and storage from showing High Workload  Limit the erroneous 100% Workload scores

20 Operations: Health Minor Badge – Anomalies  Measures how normal is this object behaving? Is what the vC Ops 1.x Health score was, but now inversed  Derived from the number of metrics that are outside of their “Normal” trended ranges Learns dynamic ranges of “Normal” for each metric Identifies metric abnormalities  Low Anomalies is good (0-100) Zero meaning the object is performing exactly the way vC Ops expects it to for that time of the day, that day of the week A high number of anomalies are usually an indication of a problem  Anomalies Chart Current number of Abnormal Metrics Problem/Noise Threshold  Crossing problem threshold will increase the Anomalies Score  Does not generate an alert in this vSphere UI

21 Workload and Anomalies Workload and Anomalies together tell you a lot…  Workload High & Anomalies Low Workload – Object is Running Hot Workload – Potentially Starving for Resources Anomalies – Normal Behavior for this timeframe Work with users to determine if more resources are needed  Workload High & Anomalies High Workload – Object is Running Hot Workload – Potentially Starving for Resources Anomalies – Abnormal behavior for this timeframe Something is amiss!!! Immediate Attention!!!

22 Operations: Health Minor Badge – Faults  Measures the degree of faults or problems the object is experiencing Pulled from active vCenter events  VMware specific knowledge of which vCenter Events affect Availability and Performance (examples): Loss of redundancy in NICs or HBAs Memory checksum errors HA failover problems  Low Faults is good (0-100) Each fault has a default score (e.g. 25, 50, 75, 100) Highest individual Fault Score drives the Fault object Score  Best Practices: Do not change the Faults Threshold Use Alerts View to manage Faults  Faults shown in Widget

23 Capacity Planning: Major Badge – Risk  Are there future risks to my systems and VI?  Identifies potential problems that could eventually hurt the performance  Low Risk is good (0-100)  Risk Score is calculated from its Minor Badges Time Remaining Capacity Remaining Stress  Risk Chart Shows Risk score over the last 7 days

24 Capacity Planning: Risk Minor Badge – Time Remaining  Measures time remaining before each resource type reaches its capacity CPU Memory Disk Network I/O  Early warning of upcoming provisioning needs Avoid future performance issues  High Time Remaining is good (100-0)  Graph shows resource utilization trends

25 Capacity Planning: Risk Minor Badge – Capacity Remaining  Measures how many more VMs can be placed on the object  Percentage of Total VM “Slots” Remaining Based on the average size of the VM on the object (e.g. VM profile) Each object has its OWN VM profile size: Host, Cluster, Datacenter, Etc.  High Capacity Remaining is good (100-0) Zero mean no room left for more VMs  333 More VMs correlates to 77% Capacity Remaining for this object

26 Capacity Remaining Calculation  Determine Capacity Constraint Resource Dashboard Chart does not show which resource is the limiting one Must drill into the Details Chart  Deployed or Powered On VMs Deployed/Powered Off VMs only use disk space resources Powered On VMs uses ALL of the 4 resources  Calculation Example Shown: Limiting Resource is Disk Space with 333 VMs available Use the Deployed VM number of 99 to do the calculation for percentage space remaining Determine Capacity Remaining 333 / (333 + 99) = 77%

27 Capacity Planning: Risk Minor Badge – Stress  Stress measures long-term or chronic workload Workload shows an instantaneous value Stress looks over a longer period of time  Quickly find and resolve Undersized objects Population contention  Low Stress is good (0-100)  Stress score encompasses a six (6) week period Workloads > 70% = “Stressed” Threshold Configurable  Chart shows weeks break down of Stress for each day/hour averaged over the last six (6) Weeks

28 Stress Calculation  Stress Score is a % and is based on area of Workload Above “Stress Line” Threshold compared to the Total Capacity of the object Stress line is configured in the vC Ops Configuration Wizard Stress Score = (Stress area / Stress Zone) *100  Example Stress Line is 70% Workload 12% of the area is above the 70% threshold Stress Score is 12 0 100 70 Stress Zone Workload Line 6 Weeks 12%

29 Stress Configuration – Host or Cluster  Access via Configuration Widget Stressed Cluster and Host Undersized VM  Stress Line CPU and/or Memory Workload  Stress Threshold When should an object appear on the Stress Reports Does not affect Badge Score Object is stressed if its degree stressed is greater than the % Stressed threshold Determines the Stress line for a physical resource (viz. CPU, Memory)

30 Stress Configuration – Undersized VM Detection A cluster or host is identified as stressed if its degree stressed is greater than the % Stressed threshold Use Any or All thresholds for detection Determines the Stress line for a physical resource (viz. CPU, Memory)

31 Workload, Anomalies and Stress Adding Stress Badge can tell you even more…  Workload High & Anomalies Low & Stress High Workload – Object is Running Hot Workload – Potentially Starving for Resources Anomalies – Normal Behavior for this timeframe Stress – Object is often running under high Workload Add resources!!!  Workload High & Anomalies Low & Stress Low Workload – Object is Running Hot Workload – Potentially Starving for Resources Anomalies – Normal Behavior for this timeframe Stress – Object usually has enough resources Not likely a big problem…a cyclical workload spike?

32 Capacity Planning: Major Badge – Efficiency  Are there optimization opportunities in my systems?  How to run a leaner datacenter  Save $$$ by better utilizing resources  High Efficiency is good (100-0)  Efficiency Score calculated from Minor Badges Reclaimable Waste Density  Graph Depicts VMs by Percent Optimal – Optimally Provisioned VMs Waste – Over Provisioned VMs Stress – Under Provisioned VMs Not used in Efficiency Calculation (see Risk)  Three Resources Considered CPU Memory Disk Space  Note: VMs can appear in Stress and Waste

33 Capacity Planning: Efficiency Minor Badge – Reclaimable Waste  Measures the over-provisioning for an object  It identifies the amount of reclaimable resources CPU Memory Disk  Low Reclaimable Waste is good (0-100)  Reclaimable Waste = Reclaimable Capacity / Deployed Capacity Score depicts the MAX of the CPU, Memory and Disk calculation Disk calculation can also include old snapshots and templates  Graph shows breakdown of the Waste section of the Efficiency Badge pie chart % Idle VMs (based on configured settings) % Powered Off VMs % Oversized VMs

34 Efficiency Configuration – Powered-Off & Idle VMs  Access via Configuration Widget  Powered-Off Threshold Based on % time  Idle VM Detection Based on % time - AND - All or One of the following thresholds CPU Disk I/O Network I/O Listed as Powered-Off if the total powered-off time > given % Time Powered- Off Threshold in a given time interval Listed as Idle if the total time during which all or any of the resource usage is below the specified thresholds in a given time interval

35 Powered-off VMs

36 Idle Virtual Machines

37 Efficiency Configuration – Oversized VMs  Access via Configuration Widget  Oversized Detection CPU and/or Memory Workload  Oversized Threshold What percentage of Oversized is acceptable When should an object be reported An Object is oversized if its degree oversized is greater than the % Oversized threshold For the given time interval, CapacityIQ first calculates if a physical resource (viz. CPU, Memory) is over-sized based on the configurable Utilization Less Than threshold.

38 Oversized VMs - Calculation % Oversized Threshold = Area in Blue/ Area of Grey Box Higher the ratio (i.e. more blue), higher the over-sizing

39 Capacity Planning: Efficiency Minor Badge – Density  Contrasts Actual vs. Ideal Density  Identify Optimal Resource Deployment Before Contention Occurs  Greater Consolidation  $$$  High Density is good (100-0)  Measures consolidation ratios: VMs/Host Ratios vCPU/Physical CPU Ratios vMem/Physical Memory Ratios

40 vC Ops Badges – Standard vs. Advanced

41 vC Ops Default UI – Badge Thresholds  Adjust levels to user defined settings  Access via Configuration Widget  Set Infrastructure and VM thresholds separately Capacity problems for a Host requires more “warning” than a VM Disable Color Threshold by Clicking the Level Off

42 What Updates When  All Operations screens update Real-time  Planning Summary, Views and Reports update Real-time  All other Planning functions are updated at Mid-night Badges Alerts Environment View Dashboards

43 Operations Tab

44 Operations: Environment  Updates to the 1.0 Skittles View Operations Badges Relationship to the Datastore Left Pane Navigation Drives Focus (e.g. Datastore) Left Pane Navigation Drives Focus (e.g. Datastore) New World Object Multi vCenter Support

45 Operations: Scoreboard  Identical to the 1.0 Scoreboard View

46 Operations: Details  Detail – Common Widgets Easier Navigation via Dropdown

47 Operations: Details  Health Badge Focus Overview of the 3 Minor Health Badges

48 Operations: Details  Workload Badge Focus : Host Example Improved Legends and Keys Scroll Down for new graphs for Disk and Network I/O Individual objects color- coded to match badge score

49 Operations: Details  Workload Badge Focus : VM Example Reserved, Limits and Entitlement Highlighted on Graphs

50 Operations: Details  Workload Badge Focus : Datastore Example Space Available Throughput IOPS Latency

51 Operations: Details  Anomalies Badge Focus Subset of the Anomalies for an object Help with any troubleshooting efforts Visualize magnitude and impact

52 Operations: Details  Fault Badge Focus Details of vCenter Faults

53 Operations: Events  Updates to the 1.0 Events View Choose Badge For which objects should I show Alerts and Events? Overlay Badge Alerts Overlay Change Events Overlay Change Events Health Score Line Health Score Line

54 Operations: All Metrics  New Metrics Available Badge Metrics Capacity Planning Metrics

55 Planning Tab

56 Planning: Environment  Updates to the 1.0 Skittles View Planning Badges Relationship to the Datastore Left Pane Navigation Drives Focus (e.g. Datastore) Left Pane Navigation Drives Focus (e.g. Datastore) New World Object Multi vCenter Support

57 Planning: Scoreboard  Identical to the 1.0 Scoreboard View

58 Planning: Summary  “Classic CapIQ” Dashboard rolled up under Summary tab Summary view context sensitive to object selected  Network I/O trending and forecasting Usable Capacity supports Network I/O  What-if Modeling allows CPU & Memory Reservations and Limits configuration

59 Planning: Views  Reports Organized by “Badge” 5 different categories – one for each minor badge under Risk and Efficiency  New List Reports VM List Datastores List Datastores Waste List  Views associated with Datastores

60 Planning: Events  Identical to Operations: Events Tab Choose Badge For which objects should I show Alerts and Events? Overlay Badge Alerts Overlay Change Events Overlay Change Events Risk Score Line Risk Score Line

61 Configuration Widget: Planning & Reports – Summary, Views & Reports Defines the time interval in Dashboard Defines the time interval in Trend Views Defines the time interval in Non-trend Views e.g. various Summary views Defines the buckets for Distribution Views e.g. Configured Host Capacity – Distribution, Host Utilization – Distribution Defines the period and intervals for the CSV/PDF reports

62 Configuration Widget: Planning & Reports – Capacity and Time Remaining Defines resources used for time remaining calculations Defines capacity value to be used for calculations

63 Configuration Widget: Planning & Reports – Usable Capacity Reserved capacity Total capacity Remaining capacity Used capacity Usable capacity Defines capacity to be set aside as reserved capacity: CPU Memory Disk I/O Disk Space Network I/O Defines capacity to be set aside as reserved capacity: CPU Memory Disk I/O Disk Space Network I/O Defines if current or historical capacity values are used for calculations

64 Configuration Widget: Planning & Reports – Usage Calculation By default, CapacityIQ calculates capacity usage based all 24 hours of data every day Use specific hours and days to match business week workload, and not skew data with off- peak usage

65 Alerts Tab

66 Smart Alerts – Overview  New Alerting Functionality  Smarts Alerts Available in EACH vC Ops Suite edition  Different Types of Smart Alerts Custom UI Alerts Can show vSphere UI Badge Alerts Alerts driven by Problem/Noise Threshold Anomaly Breaches KPI Threshold Breaches Very useful for groups of objects (e.g. Application Monitoring) vSphere UI Badge Alerts Threshold Based Driven by Badge Color Change Thresholds Only Alert on Minor Badges Workload YES – Health NO Good for Alerts on single objects (e.g. VM)

67 Smart Alerts - Configuration  Enable/Disable Alerts by Specific Badge Definitions  Create alerts on vCenter faults Subset of events from vCenter are considered faults VMware best practices and knowledge  Enable Infrastructure and VM Alert separately  Access via Configuration Widget Disable threshold level to disable the alert Turn off “Workload Orange” – No Alert

68 Smart Alerts View Alert Volume History Specific Alerts

69 Smart Alerts – Operator Functions  Alerts go from Active to Inactive based on badge level changes  Take Ownership  Release Ownership  Cancel Fault will deactivate Fault Alerts if they are no longer valid  Suspend (Min) or Suppress (Days) Temporarily remove Alert from list

70 Smart Alerts – Usability Filter to view specific Badges Filter on column values Add and Remove columns Search for specific alerts

71 Smart Alerts Details  Double click on an alert to see the details  Details view differs based on the alert type (e.g. Workload vs. Anomalies)

72 Smart Alerts – External Notification Configuration  Configure via the Administration UI  SNMP Notifications All alerts are streamed to the source Filtering must occur on the Destination System  SMTP Notifications Create Email Rules for filtering

73 Smart Alerts – Email Notification Rules  Configure via the Notification Widget  Create Email Rules via Notification Widget  Configure Email address Alert Types Criticality Levels Object Children

74 Analysis Tab

75 Analysis – Heatmaps  Heatmaps like in vC Ops Std 1.0  We now have the Capacity badges and metrics available in the heatmaps  Examples: Which Clusters are Healthy and have available Capacity? Which hosts have a Low Workload and a low Density?

76 Reports Tab

77 Reports  CapIQ Reports merged into Reports Tab  Only Reports related to vSphere Capacity, even in Ent Plus

78 Schedule and Publish Report  Per-User Scheduling  Publish via email

79 Reports Settings

80 vCM  vC Ops Integration

81 vCM  vC Ops : Change Events Correlated with Performance Overview  Integration between vCM and vC Ops Mgr for change events  Overlay Guest OS configuration changes from vCM in vC Ops performance trend graphs  Launch in context into vCM to see full details of changes and potentially remediate them Benefits  Enable Operations to quickly understand and resolve performance issues arising from configuration changes (reduce MTTR)  Drive efficient & effective troubleshooting by correlating Guest OS configuration changes w/ VM performance degradations

82 vCM Events in vC Ops – Event Collected  vC Ops does not pull in every event from vCenter Only events that could affect health or workload (vSphere Knowledge!)  Adapter only pulls in change events for Guest OSs No ESX/i Host configurations changes (these come from vCenter Adapter) Guest OS has to be by managed by vCM Event Collected Reboot Software Install/Uninstall Windows Registry IP/Networking changes Device Driver changes Memory/CPU changes Windows Firewall Patches

83 vCM Change Events Correlated with Performance  Launches to the Master Change Log view in vCM for the change in question  Rollback the change (if possible)

84 Deployment Architecture - Requirements  vCenter 4.0 u2 & above  vCM 5.4.1 configured to collect from vCenter VMs  vC Ops Mgr 5.0 Collect from the same vCenter IE (7, 8 or 9*) is required for launch-in-context  vCM Adapter pre-installed on vC Ops Mgr vApps Install separately for the non-vApp (Enterprise Plus Linux or Windows Installers )  vC Ops Management Suites Enterprise Plus Enterprise Could be enabled if and only if a la carte full vCM functionality is added for some VMs Otherwise no GUEST OS data to gather vCM vC Ops Mgr Collector vCM Adapter VMware Adapter vCM DB vCenter * There are no known issues with IE9 in compatibility mode

85 Packaging and Licensing Back

86 vCenter Operations Management Suite Packaging Standard Edition Enterprise Plus Edition VC Ops Mgr 5.0 – Std.VC Ops Mgr 5.0 (incl. CapIQ) VC Infra Navigator ** VC Configuration Mgr ** Not Available a-la-carte. Chargeback Mgr Advanced Edition VC Ops Mgr 5.0 (incl. CapIQ) For hybrid cloud and heterogeneous environments For larger vSphere environments Automated Operations Management For smaller vSphere environments Enterprise Edition VC Ops Mgr 5.0 (incl. CapIQ) VC Infra Navigator ** VCM for vSphere ** Chargeback Mgr For virtual and cloud infrastructure New SKU New Name

87 Confidential Thank You

© 2010 VMware Inc. All rights reserved Enterprise Management.

Similar presentations

Presentation on theme: "© 2010 VMware Inc. All rights reserved Enterprise Management."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2010 VMware Inc. All rights reserved Enterprise Management.

Similar presentations

Presentation on theme: "© 2010 VMware Inc. All rights reserved Enterprise Management."— Presentation transcript:

Similar presentations

About project

Feedback