State Monitoring in Cloud Datacenters Shing Meng (Student Member, IEEE) Ling Liu (Senior Member, IEEE) Ting Wang (Student Member, IEEE) IEEE Transactions.

Slides:

Advertisements

Similar presentations

1 Impact of Decisions Made to Systems Engineering: Cost vs. Reliability System David A. Ekker Stella B. Bondi and Resit Unal November 4-5, 2008 HRA INCOSE.

Advertisements

Managing Web server performance with AutoTune agents by Y. Diao, J. L. Hellerstein, S. Parekh, J. P. Bigu Jangwon Han Seongwon Park

BARNALI CHAKRABARTY. What is an Operating System ?

Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.

Hadi Goudarzi and Massoud Pedram

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

1 An Approach to Real-Time Support in Ad Hoc Wireless Networks Mark Gleeson Distributed Systems Group Dept.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

Chapter 19: Network Management Business Data Communications, 4e.

1/14 Ad Hoc Networking, Eli M. Gafni and Dimitri P. Bertsekas Distributed Algorithm for Generating Loop-free Routes in Networks With Frequently.

PORT: A Price-Oriented Reliable Transport Protocol for Wireless Sensor Networks Yangfan Zhou, Michael. R. Lyu, Jiangchuan Liu † and Hui Wang The Chinese.

Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.

Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.

Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.

An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.

Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

Commonwealth of Massachusetts Statewide Strategic IT Consolidation (ITC) Initiative ITD Virtualization and Shared Services Executive Briefing Presentation.

DAvinCi: A Cloud Computing Framework for Service Robots

New Challenges in Cloud Datacenter Monitoring and Management

CHAPTER OVERVIEW SECTION 5.1 – MIS INFRASTRUCTURE

A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.

A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.

COGNITIVE RADIO FOR NEXT-GENERATION WIRELESS NETWORKS: AN APPROACH TO OPPORTUNISTIC CHANNEL SELECTION IN IEEE BASED WIRELESS MESH Dusit Niyato,

Department of Computer Science Engineering SRM University

An approach to Intelligent Information Fusion in Sensor Saturated Urban Environments Charalampos Doulaverakis Centre for Research and Technology Hellas.

Copyright 2009 Fujitsu America, Inc. 0 Fujitsu PRIMERGY Servers “Next Generation HPC and Cloud Architecture” PRIMERGY CX1000 Tom Donnelly April

Unit 8 Syllabus Quality Management : Quality concepts, Software quality assurance, Software Reviews, Formal technical reviews, Statistical Software quality.

A Framework for Energy- Saving Data Gathering Using Two-Phase Clustering in Wireless Sensor Networks Wook Chio, Prateek Shah, and Sajal K. Das Center for.

Network Aware Resource Allocation in Distributed Clouds.

SOFTWARE ENGINEERING1 Introduction. Software Software (IEEE): collection of programs, procedures, rules, and associated documentation and data SOFTWARE.

Low-Power Wireless Sensor Networks

Chapter 6 : Software Metrics

Overlay Network Physical LayerR : router Overlay Layer N R R R R R N.

Parallel and Distributed IR. 2 Papers on Parallel and Distributed IR Introduction Paper A: Inverted file partitioning schemes in Multiple Disk Systems.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

Distributed Anomaly Detection in Wireless Sensor Networks Ksutharshan Rajasegarar, Christopher Leckie, Marimutha Palaniswami, James C. Bezdek IEEE ICCS2006(Institutions.

Event Management & ITIL V3

Patch Based Mobile Sink Movement By Salman Saeed Khan Omar Oreifej.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

1 [3] Jorge Martinez-Bauset, David Garcia-Roger, M a Jose Domenech- Benlloch and Vicent Pla, “ Maximizing the capacity of mobile cellular networks with.

A performance evaluation approach openModeller: A Framework for species distribution Modelling.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

CONTI'20041 Event Management in Distributed Control Systems Gheorghe Sebestyen Technical University of Cluj-Napoca Computers Department.

1 WIRELESS ACCESS FOR THE 21ST CENTURY Anywhere - Anytime WIRELESS ACCESS FOR THE 21ST CENTURY Anywhere - Anytime IMT-2000 STANDARDIZATION Michael H. Callendar.

Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.

Chapter 5 McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.

VMware vSphere Configuration and Management v6

DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team

Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.

Selective Packet Inspection to Detect DoS Flooding Using Software Defined Networking Author : Tommy Chin Jr., Xenia Mountrouidou, Xiangyang Li and Kaiqi.

A Reliability-oriented Transmission Service in Wireless Sensor Networks Yunhuai Liu, Yanmin Zhu and Lionel Ni Computer Science and Engineering Hong Kong.

Simplifying Cloud Connectivity for Your Clients Presenter: Tom SharkeyTom Sharkey December 8,

Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

Hierarchical Management Architecture for Multi-Access Networks Dzmitry Kliazovich, Tiia Sutinen, Heli Kokkoniemi- Tarkkanen, Jukka Mäkelä & Seppo Horsmanheimo.

1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

CHAPTER OVERVIEW SECTION 5.1 – MIS INFRASTRUCTURE

Fault Tolerance Distributed Web-based Systems

Self-Managed Systems: an Architectural Challenge

Harrison Howell CSCE 824 Dr. Farkas

Model-based Adaptation for Self-Healing Systems David Garlan, Bradley Schmert ELSEVIER Sciences of Computer Programming 57 (2005) 이경렬

Presentation transcript:

State Monitoring in Cloud Datacenters Shing Meng (Student Member, IEEE) Ling Liu (Senior Member, IEEE) Ting Wang (Student Member, IEEE) IEEE Transactions On Knowledge And Data Engineering Vol 23. No.9,September 2011 Presented by-Kartik Babu Boga

INTRODUCTION A DataCenter : Facility for housing computer and its associated components such as Telecommunication and Storage Systems. A Cloud DataCenter: is advancement in DataCenter promoting provision for On-Demand system resources and computing. A Cloud Application: is delivering Software as a Service(SaaS) over Internet. Ex 1)The National Climatic Data Center (NCDC) Is a public data center that maintains the world's largest archive of weather information Ex 2) Amazons elastic computer cloud (EC2) It is designed to make web-scale computing easier for developers.

The scale of cloud datacenters and the diversity of application specific metrics pose significant challenges on both system and data aspects of datacenter monitoring due to the following reasons. Event capturing: Tremendous amount of events raise a number of system level issues. Resource Consumption: Large scale monitoring involves processing large amount of data. Reliability: System failures raise system level issues in Data center Monitoing.

Challenges in Large-Scale Monitoring Distributed Aggregation: Its hard to summarize the voluminous monitored values. Shared Aggregation : Some Monitoring tasks share similarities and perform monitoring in isolation leading to un necessary resource consumption. In this paper, we study state monitoring at cloud datacenters, which can be viewed as a cloud state management issue.

State Monitoring A key challenge for efficient state monitoring is meeting the two demanding objectives: high level of correctness, which ensures zero or very low error rate, and high communication efficiency which requires minimal communication cost.

If the overall request rate, deviates from a normal state for a distributed application We refer to this type of monitoring as state monitoring. State monitoring is widely used in many applications. Examples are: Traffic engineering Quality of service Botnet detection One intuitive state monitoring approach is the instantaneous state monitoring, which triggers a state alert whenever a predefined threshold is violated. Example: Internet applications causes frequent and unnecessary state alerts

Instanataneous State Monitoring An intuitive approach for state monitoring. Triggers a state alert when a predefined threshold violates. Most of the exisiting work dealt with this approach.

WIndows based StatE monitoring(WISE) In this paper, we introduce the concept of window-based state monitoring and devise a distributed WISE framework for cloud datacenters. WISE triggers state alerts only when the state violation is continuous. WISE may not scale well in the presence of lager number of monitoring nodes. Thus we present an improved windows based monitoring approach that improves our basic approach along several dimensions. We develop a set of optimization techniques to optimize the performance of the fully distributed WISE. We also compare the original WISE with the improved WISE on various aspects. Our results suggest that the improved WISE is more desirable for large-scale datacenter monitoring.

3 Unique Contributions by WISE Employ novel distributed state monitoring algorithm achieving communication efficiency. Use a distributed parameter tuning and a cost model to reduce communication cost. Develop set of optimization techniques.

Example

Problem Description The focus of existing work is to find optimal local threshold values that minimize the overall communication cost. As monitored values often contain momentary bursts and outliers, instantaneous state monitoring is subject to cause frequent and unnecessary state alerts, which could further lead to unnecessary countermeasures. Problem is challenging because careful handling of monitoring windows at distributed nodes is required to ensure both communication efficiency and monitoring correctness. We start with the most intuitive approach, applying the instantaneous monitoring algorithm.

Approach Three technical developments that form the core of the WISE monitoring approach: The WISE monitoring algorithm Reports partial information on local violation series at the monitor node side to save communication cost. The monitoring parameter tuning schemes If a node often observes higher monitored values compared with other nodes centralized parameter tuning scheme is used. In Exponential increasing nature of search space, we develop a distributed parameter tuning scheme that avoids centralized information collecting and parameter searching.

Performance optimization techniques. To further minimize the communication cost between a coordinator node and its monitoring nodes we use 2 techniques. The staged global poll and The termination message To achieve the best communication efficiency, local monitoring parameters need to be tuned according to the given monitoring task and monitored value distributions. The WISE monitoring algorithm guarantees monitoring Correctness. Monitor Algorithm – WISE uses two separate algorithms for Monitor node and Coordinator Node. – Filtering Windows, Skeptical Windows.

Performance Evaluation WISE achieves a reduction from 50 to 90 percent in communication cost compared with instantaneous monitoring algorithm The centralized parameter tuning scheme effectively improves the communication efficiency. The optimization techniques further improve the communication efficiency of WISE The actual gain is generally better (50 to 90 percent reduction in communication cost) with parameter tuning and optimized subroutines. The centralized scheme suffers from scalability issues with small number of monitor nodes.

Basic Wise and Improved Wise The scalability of WISE is better in communication overhead when compared to Instantaneous, even more with the distributed parameter tuning. Though distributed parameter tuning scheme has less performance than the centralized scheme, due to its Scalability is used for large scale distributed systems. There is more communication efficiency when using two optimization techniques with distributed parameter tuning.

Distributed Tuning versus Centralized Tuning The distributed scheme performs even better than the centralized scheme when the number of nodes is large. The distributed tuning scheme is a desirable alternative as it provides comparable communication efficiency and better scalability.

Related Research The early work [18] done by Dilman and Raz propose a Simple Value scheme which sets all Ti to T=n and an Improved Value which sets Ti to a value lower than T=n. Jain et al. [26] discuss the challenges in implementing distributed triggering mechanisms for network monitoring and they use local constraints of T=n to detect violation The more recent work of Sharfman et al. [20] represents a geometric approach for monitoring threshold functions. Kashyap et al. [22] propose the most recent work in detecting distributed constraint violation

Conclusion and Future work The increasing use of consolidation and virtualization is driving to manage cloud applications and services. State monitoring the crucial functionality for on demand provision of resources and services in cloud datacenters. Not only resilient to bursts, outliers but also save communication. Experiment result show WISE achieved % communication reduction.

Current results monitor the window-based state violation for one application running over a collection of distributed computing nodes Future research is Scheduling of multiple application State monitoring tasks. Perform failure resilient state monitoring.

References Amazon, Amazon Elastic Computer Cloud(Amazon ec2), Amazon Cloudwatch Beta, S. Meng, T. Wang, and L. Liu, Monitoring Continuous State Violation in Datacenters: Exploring the Time Dimension, Proc. IEEE 26th Intl Conf. Data Eng. (ICDE),

Questions???