Pankaj Kumar Qinglan Zhang Sagar Davasam Sowjanya Puligadda Wei Liu Cloud Computing Term Project Cloud Monitoring and Scaling Merged CD Group Pankaj Kumar Qinglan Zhang Sagar Davasam Sowjanya Puligadda Wei Liu
Overview What we have done Future work Openstack installation Create and manage VM instances Monitoring tools: Ganglia & Zenoss Future work Compare monitoring tools and present union of data metrics from Ganglia and Zenoss Performance analysis and VM scaling Controlling and computing availability
Architecture Diagram of the setup Cloud for Monitoring, Scaling and performance measurement Ganglia and Zenoss Tools UBUNTU- 12.04 cloud OS VM-1 Ganglia and Zenoss Tools UBUNTU-12.04 cloud OS VM-1 UBUNTU-12.04 cloud OS VM-1 Ganglia and Zenoss Tools Similar VM-2 and VM- 3 Similar VM-2 and VM-3 Similar VM-2 and VM-3 OPENSTACK OPENSTACK OPENSTACK UBUNTU 12.04 UBUNTU 12.04 UBUNTU 12.04 Host 1 – 192.168.2.120 Host 2 – 192.168.2.121 Host 3 – 192.168.2.122
Openstack installation Use devstack multi-node installation http://devstack.org/guides/multinode-lab.html Network configuration & NTP Add User & Set Up SSH Configure Controller & Slaves
Issues and best practices Devstack script does not work well. The problems we have solved are as follow: Mysql conflicts Invalid download link Permission denial Su; chown –R stack:stack devstack/; No reboot post devstack installation VM installation using openstack dashboard
Create and manage VM instance Verify the Identity Service installation Verify the Image Service installation Enable Networking Generate a keypair and choose a flavor Launch the instance
Scaling Analyze the performance result recorded by Ganglia and Zenoss. Set threshold to scale out or shrink back Try dynamic strategies to scale automatically
Availability Monitor the availability Test availability in different situations: Shut down server Shut down single vm Shut down multiple vms Turn on vms
Ganglia Vs Zenoss
Zenoss Performance Metrics Aggregate Reports CPU usage Free memory Free swap Network Input and Output Availability - Percentage of time that a device or component is considered available CPU Utilization – Shows monitored interfaces, devices, load average and % utility
Zenoss Performance Metrics Filesystem Utilization - Shows total bytes, used bytes, free bytes, and percentage utilization for each device. Interface Utilization - Shows the traffic through all network interfaces monitored by the system. Interface’s rated bandwidth Average input traffic Average output traffic Total average traffic across interface % utilization of bandwidth
Zenoss Performance Metrics Memory Utilization - Provides system-wide information about the memory usage for devices in the system. Total memory Available memory Cache memory Buffered memory Percentage of memory utilized
Zenoss Extended Monitoring Apache Web Server DNS File Transfer Protocol Internet Relay Chat Jabber Instant Messaging LDAP Response Time MySQL Database Network News Transport Protocol Network Time Protocol ONC – System Remote Procedure Call Webpage Response Time (HTTP) VMware esxtop Xen Virtual Hosts
Ganglia Vs Zenoss Zenoss can automatically discover hosts and start monitoring them automatically. Ganglia requires an agent to run on every host to gather information Ganglia does not monitor events, Zenoss reports about event count, event queue length Zenoss can monitor services but Ganglia can not Status of a running task can be received by Zenoss tool but Ganglia can not be used for
Zenoss Monitoring System Architecture
Zenoss features Zenoss offers visibility over the entire IT stack, from network devices to applications. Automatic discovery, Inventory via CMDB, Availability monitoring, Easy-to-read performance graphs, Sophisticated alerting, An easy-to-use web portal
Why Both Ganglia and Zenoss More data and parameters received from both tools will make Decision making easier Decision making robust and efficient Decision and action will be more reliable
Union of data read by Ganglia and Zenoss Tool UBUNTU-12.04 cloud OS VM-1 Project Architecture Diagram Scale out to meet availability – add VMs All VMs Ganglia and Zenoss Tools VM provisioning Module Union of data read by Ganglia and Zenoss Tool read_availability < demanded_availability Scaling algorithm Yes Decision making Module
How to Scale Scale up(horizontal) or Scale out (vertically )? scale up: add resources to a single node scale out: add more nodes to a system We decide Scale out Add VMs on nodes in system So when we scale our system? System availability is under our expectation. Scaling !!!
Our goal Add minimum VMs to get “three nines” availability (Assuming adding VMs will enhance the availability) Availability % Downtime per year Downtime per month* Downtime per week 90% ("one nine") 36.5 days 72 hours 16.8 hours 95% 18.25 days 36 hours 8.4 hours 97% 10.96 days 21.6 hours 5.04 hours 98% 7.30 days 14.4 hours 3.36 hours 99% ("two nines") 3.65 days 7.20 hours 1.68 hours 99.5% 1.83 days 3.60 hours 50.4 minutes 99.8% 17.52 hours 86.23 minutes 20.16 minutes 99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes 99.95% 4.38 hours 21.56 minutes 5.04 minutes 99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes 99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds 99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds 99.99999% ("seven nines") 3.15 seconds 0.259 seconds 0.0605 seconds
Our Algorithm Initialize several VMs, get the monitoring data and availability. CPU, Network, I/O, Memory usage percentage Availability Run our algorithm to make decision 1. Calculate the initialized weight of each factor 2. Calculate the number of VM we needed (Wc, Wn,Wi, Wm) = f (Uc, Un,Ui, Um, Ni ,Ni+1, Ai ,Ai+1) Wc, Wn,Wi, Wm are weights of CPU, Network, I/O and Memory Uc, Un,Ui, Um are usage of CPU, Network, I/O and Memory Ni ,Ni+1, Ai ,Ai+1 are number of VMs and Availability in training i and i+1. 3. Lance algorithm every a period of time to make decision and revise the algorithm itself based on data collected by Ganglia and Zenoss Machine learning: semi-supervised approach
Problem Now How can we get the function f? Two possible ways: 1. Generate the function theoretically, and use the collected data to verify and revised the function. Hard to accomplish for current stage. 2. Collected the data first, and guess the function and use some tool (MATLAB) to generate the function. After training the data, improve the accuracy and revise the algorithm. This is what we will use. Example next
Guess and Example From (Wc, Wn,Wi, Wm) = f (Uc, Un,Ui, Um, Ni ,Ni+1, Ai ,Ai+1) Guess general function to calculate the weights log 10 1− 𝐴 𝑖 1− 𝐴 𝑖+1 = 𝑁 𝑖+1 − 𝑁 𝑖 𝑁 𝑖+1 ∗ 𝑊 𝑐 ∗ 𝑈 𝑐𝑖 + 𝑊 𝑛 ∗ 𝑈 𝑛𝑖 + 𝑊 𝑖 ∗ 𝑈 𝑖𝑖 + 𝑊 𝑚 ∗ 𝑈 𝑚𝑖 Wc, Wn,Wi, Wm are weights of CPU, Network, I/O and Memory Uc, Un,Ui, Um are usage of CPU, Network, I/O and Memory Ni ,Ni+1, Ai ,Ai+1 are number of VMs and Availability in training i and i+1. For example: When i = 1, 2, 3, 4, we can get 4 equation and calculate the weights. Data from Ganglia Data from Zenoss Number of VMs CPU usage Network usage I/O usage Memory usage Availability Training 1 1 80% 50% 90% 60% Training 2 2 75% 43% 88% 52% 92% Training 3 3 69% 38% 70% 48% 96% Training 4 4 35% 65% 97.8% Expected ? / / 99.9% 𝐴 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 Goal
Example and Guess After we get the weights, make decision how many VMs should be added. log 10 1− 𝐴 𝑖 1− 𝐴 𝑖+1 = 𝑁 𝑖+1 − 𝑁 𝑖 𝑁 𝑖+1 ∗( 𝑊 𝑐 ∗ 𝑈 𝑐𝑖 + 𝑊 𝑛 ∗ 𝑈 𝑛𝑖 + 𝑊 𝑖 ∗ 𝑈 𝑖𝑖 + 𝑊 𝑚 ∗ 𝑈 𝑚𝑖 ) Get it ! 𝑁 𝑖+1 = 𝑁 𝑖 1− log 10 1− 𝐴 𝑖 1− 𝐴 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 ( 𝑊 𝑐 ∗ 𝑈 𝑐𝑖 + 𝑊 𝑛 ∗ 𝑈 𝑛𝑖 + 𝑊 𝑖 ∗ 𝑈 𝑖𝑖 + 𝑊 𝑚 ∗ 𝑈 𝑚𝑖 ) 𝐴 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 = 99.9% 𝑁 𝑖+1 : the total number of VMs after scaling.
OpenStack installation Controller Node
OpenStack installation on Compute Nodes
OpenStack installation on Compute Nodes
Instance on Compute node (CirrOS)
Instance on Compute node (Ubuntu)
Zenoss Installation on Ubuntu VM
Zenoss Login
Ganglia Installation