Presentation on theme: "State Monitoring in Cloud Datacenters Shing Meng (Student Member, IEEE) Ling Liu (Senior Member, IEEE) Ting Wang (Student Member, IEEE) IEEE Transactions."— Presentation transcript:
State Monitoring in Cloud Datacenters Shing Meng (Student Member, IEEE) Ling Liu (Senior Member, IEEE) Ting Wang (Student Member, IEEE) IEEE Transactions On Knowledge And Data Engineering Vol 23. No.9,September 2011 Presented by-Kartik Babu Boga
INTRODUCTION A DataCenter : Facility for housing computer and its associated components such as Telecommunication and Storage Systems. A Cloud DataCenter: is advancement in DataCenter promoting provision for On-Demand system resources and computing. A Cloud Application: is delivering Software as a Service(SaaS) over Internet. Ex 1)The National Climatic Data Center (NCDC) Is a public data center that maintains the world's largest archive of weather information Ex 2) Amazons elastic computer cloud (EC2) It is designed to make web-scale computing easier for developers.
The scale of cloud datacenters and the diversity of application specific metrics pose significant challenges on both system and data aspects of datacenter monitoring due to the following reasons. Event capturing: Tremendous amount of events raise a number of system level issues. Resource Consumption: Large scale monitoring involves processing large amount of data. Reliability: System failures raise system level issues in Data center Monitoing.
Challenges in Large-Scale Monitoring Distributed Aggregation: Its hard to summarize the voluminous monitored values. Shared Aggregation : Some Monitoring tasks share similarities and perform monitoring in isolation leading to un necessary resource consumption. In this paper, we study state monitoring at cloud datacenters, which can be viewed as a cloud state management issue.
State Monitoring A key challenge for efficient state monitoring is meeting the two demanding objectives: high level of correctness, which ensures zero or very low error rate, and high communication efficiency which requires minimal communication cost.
If the overall request rate, deviates from a normal state for a distributed application We refer to this type of monitoring as state monitoring. State monitoring is widely used in many applications. Examples are: Traffic engineering Quality of service Botnet detection One intuitive state monitoring approach is the instantaneous state monitoring, which triggers a state alert whenever a predefined threshold is violated. Example: Internet applications causes frequent and unnecessary state alerts
Instanataneous State Monitoring An intuitive approach for state monitoring. Triggers a state alert when a predefined threshold violates. Most of the exisiting work dealt with this approach.
WIndows based StatE monitoring(WISE) In this paper, we introduce the concept of window-based state monitoring and devise a distributed WISE framework for cloud datacenters. WISE triggers state alerts only when the state violation is continuous. WISE may not scale well in the presence of lager number of monitoring nodes. Thus we present an improved windows based monitoring approach that improves our basic approach along several dimensions. We develop a set of optimization techniques to optimize the performance of the fully distributed WISE. We also compare the original WISE with the improved WISE on various aspects. Our results suggest that the improved WISE is more desirable for large-scale datacenter monitoring.
3 Unique Contributions by WISE Employ novel distributed state monitoring algorithm achieving communication efficiency. Use a distributed parameter tuning and a cost model to reduce communication cost. Develop set of optimization techniques.
Problem Description The focus of existing work is to find optimal local threshold values that minimize the overall communication cost. As monitored values often contain momentary bursts and outliers, instantaneous state monitoring is subject to cause frequent and unnecessary state alerts, which could further lead to unnecessary countermeasures. Problem is challenging because careful handling of monitoring windows at distributed nodes is required to ensure both communication efficiency and monitoring correctness. We start with the most intuitive approach, applying the instantaneous monitoring algorithm.
Approach Three technical developments that form the core of the WISE monitoring approach: The WISE monitoring algorithm Reports partial information on local violation series at the monitor node side to save communication cost. The monitoring parameter tuning schemes If a node often observes higher monitored values compared with other nodes centralized parameter tuning scheme is used. In Exponential increasing nature of search space, we develop a distributed parameter tuning scheme that avoids centralized information collecting and parameter searching.
Performance optimization techniques. To further minimize the communication cost between a coordinator node and its monitoring nodes we use 2 techniques. The staged global poll and The termination message To achieve the best communication efficiency, local monitoring parameters need to be tuned according to the given monitoring task and monitored value distributions. The WISE monitoring algorithm guarantees monitoring Correctness. Monitor Algorithm – WISE uses two separate algorithms for Monitor node and Coordinator Node. – Filtering Windows, Skeptical Windows.
Performance Evaluation WISE achieves a reduction from 50 to 90 percent in communication cost compared with instantaneous monitoring algorithm The centralized parameter tuning scheme effectively improves the communication efficiency. The optimization techniques further improve the communication efficiency of WISE The actual gain is generally better (50 to 90 percent reduction in communication cost) with parameter tuning and optimized subroutines. The centralized scheme suffers from scalability issues with small number of monitor nodes.
Basic Wise and Improved Wise The scalability of WISE is better in communication overhead when compared to Instantaneous, even more with the distributed parameter tuning. Though distributed parameter tuning scheme has less performance than the centralized scheme, due to its Scalability is used for large scale distributed systems. There is more communication efficiency when using two optimization techniques with distributed parameter tuning.
Distributed Tuning versus Centralized Tuning The distributed scheme performs even better than the centralized scheme when the number of nodes is large. The distributed tuning scheme is a desirable alternative as it provides comparable communication efficiency and better scalability.
Related Research The early work  done by Dilman and Raz propose a Simple Value scheme which sets all Ti to T=n and an Improved Value which sets Ti to a value lower than T=n. Jain et al.  discuss the challenges in implementing distributed triggering mechanisms for network monitoring and they use local constraints of T=n to detect violation The more recent work of Sharfman et al.  represents a geometric approach for monitoring threshold functions. Kashyap et al.  propose the most recent work in detecting distributed constraint violation
Conclusion and Future work The increasing use of consolidation and virtualization is driving to manage cloud applications and services. State monitoring the crucial functionality for on demand provision of resources and services in cloud datacenters. Not only resilient to bursts, outliers but also save communication. Experiment result show WISE achieved 50 - 90% communication reduction.
Current results monitor the window-based state violation for one application running over a collection of distributed computing nodes Future research is Scheduling of multiple application State monitoring tasks. Perform failure resilient state monitoring.
References Amazon, Amazon Elastic Computer Cloud(Amazon ec2), 2008. Amazon Cloudwatch Beta, http://aws.amazon.com/cloudwatch/, 2011. http://aws.amazon.com/cloudwatch/ S. Meng, T. Wang, and L. Liu, Monitoring Continuous State Violation in Datacenters: Exploring the Time Dimension, Proc. IEEE 26th Intl Conf. Data Eng. (ICDE), 2010. http://en.wikipedia.org/wiki/Cloud_computing