Ira Cohen, Jeffrey S. Chase et al.

Slides:



Advertisements
Similar presentations
Performance Testing - Kanwalpreet Singh.
Advertisements

Trustworthy Service Selection and Composition CHUNG-WEI HANG MUNINDAR P. Singh A. Moini.
Autonomic Scaling of Cloud Computing Resources
A Tutorial on Learning with Bayesian Networks
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
Predictor of Customer Perceived Software Quality By Haroon Malik.
Introduction of Probabilistic Reasoning and Bayesian Networks
Proactive Prediction Models for Web Application Resource Provisioning in the Cloud _______________________________ Samuel A. Ajila & Bankole A. Akindele.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Energy Management and Adaptive Behavior Tarek Abdelzaher.
University of Southern California Center for Systems and Software Engineering ©USC-CSSE1 3/18/08 (Systems and) Software Process Dynamics Ray Madachy USC.
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Toward automated diagnosis and forecasting.
Bayesian Belief Networks
Network Traffic Measurement and Modeling CSCI 780, Fall 2005.
1 Distributed Online Simultaneous Fault Detection for Multiple Sensors Ram Rajagopal, Xuanlong Nguyen, Sinem Ergen, Pravin Varaiya EECS, University of.
Security in Wireless Sensor Networks Perrig, Stankovic, Wagner Jason Buckingham CSCI 7143: Secure Sensor Networks August 31, 2004.
1 Validation and Verification of Simulation Models.
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
The Research Process. Purposes of Research  Exploration gaining some familiarity with a topic, discovering some of its main dimensions, and possibly.
Towards Autonomic Hosting of Multi-tier Internet Services Swaminathan Sivasubramanian, Guillaume Pierre and Maarten van Steen Vrije Universiteit, Amsterdam,
Distributed Network Intrusion Detection An Immunological Approach Steven Hofmeyr Stephanie Forrest Patrik D’haeseleer Dept. of Computer Science University.
Ekrem Kocaguneli 11/29/2010. Introduction CLISSPE and its background Application to be Modeled Steps of the Model Assessment of Performance Interpretation.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
Preferences and Decision-Making Decision Making and Risk, Spring 2006: Session 7.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
DISCERN: Cooperative Whitespace Scanning in Practical Environments Tarun Bansal, Bo Chen and Prasun Sinha Ohio State Univeristy.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Capturing, indexing and retrieving system history Ira Cohen, Moises Goldszmidt, Julie Symons, Terence Kelly – HP Labs Steve Zhang, Armando Fox -Stanford.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Multiplicative Wavelet Traffic Model and pathChirp: Efficient Available Bandwidth Estimation Vinay Ribeiro.
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
MODEL-BASED SOFTWARE ARCHITECTURES.  Models of software are used in an increasing number of projects to handle the complexity of application domains.
Experimentation in Computer Science (Part 2). Experimentation in Software Engineering --- Outline  Empirical Strategies  Measurement  Experiment Process.
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
WERST – Methodology Group
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 6 - Standardized Measurement and Assessment
NTU & MSRA Ming-Feng Tsai
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Chapter Two Methods in the Study of Personality. Gathering Information About Personality Informal Sources of Information: Observations of Self—Introspection,
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Pinpoint: Problem Determination in Large, Dynamic Internet Services Mike Chen, Emre Kıcıman, Eugene Fratkin {emrek,
1 Internet Traffic Measurement and Modeling Carey Williamson Department of Computer Science University of Calgary.
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
Learning Software Behavior for Automated Diagnosis
Supporting Fault-Tolerance in Streaming Grid Applications
Research Challenges of Autonomic Computing
Steve Zhang Armando Fox In collaboration with:
Pattern Recognition and Image Analysis
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
On applying pattern recognition to systems management
Presentation transcript:

Ira Cohen, Jeffrey S. Chase et al. Correlating Instrumentation data to system states: A building block for automated diagnosis and control Ira Cohen, Jeffrey S. Chase et al.

Introduction Networked systems continue to grow in scale Complex behavior stemming from interaction of Workload Software structure Hardware Traffic conditions System goals Pervasive System needed to manage such a system Examples? HP’s Openview IBM’s Tivoli (Aggregates + displays graphically)

Introduction Two approaches to build self managing systems A priori models Event-condition-action rules Not based on real systems (Disadvantages?) Difficult and costly Unreliable, does not take account of all

Introduction Statistical learning techniques Assumes little to no domain knowledge Hence “general” Problem! Still have to identify techniques that are powerful enough to induce effective models that are: Efficient Accurate Robust

Goals Automatically analyze instrumentation data from network services in order to Forecast Diagnose Repair failure conditions We use the Tree-Augmented Naïve Bayesian Networks (TANs) as the basis for Diagnosis Forcasting System-level instrumentations in a 3-tier network service. Widely used in various fields, but TANs are not used in the context of computer systems.

Goals Analyzed data from 124 metrics gathered from 3 tiered e-commerce site under synthetic load Httperf Java PetStore as platform TAN model select combination of metrics and threshold values that complies with Service Level Objectives for average response time. Results later

What is a TAN? Bayesian network is an annotated directed acyclic graph encoding a joint probability distribution Naïve Bayesian Network State var S is only parent of all other vertices Assumes all metrics are fully independent given S TANs consider relationships among metrics themselves, with constraint that each metric has only one other parent than S

Why Use a TAN? Based on premise that a relatively small subset of metrics and threshold values is sufficient to approximate the distribution accurately Outperforms generalized Bayesian networks and other alternatives in both Cost Accuracy

Why use a TAN? Useful for forecasting failures and violations Possible to induce models that predict SLO violations in near future, even when system is stable Automated controller can invoke directly Identify impending violation Respond Loading Adding resources Cheap model to induce Possible to maintain multiple models Periodic refresh

Setup System is 3-tier webservice Apache Middleware (BEA WebLogic) Oracle db 3 Servers with HP Openview to collect statistics Load Generator is httperf SLO indicator processes the logs to determine compliance

Interpretability and Modifiability TANs offer other advantages Interpretability Modifiability Influence of each metric can be quantified in a probabilistic model Analysis catalogs each type of violation according to the metrics and values that correlate with observed instances Strength is given from prob value occurring in different states Gives insight to causes of violations and how to repair

Workloads Varies several characteristics Aggregate req rate Number of concurrent connections Fraction of data-intensive vs app-intensive requests This is to exercise the model-induction methodology by providing it with a wide range of M,P pairs Where M = sample of values for system metrics P = vector of app-level performance measurements

Workloads RAMP: Increasing concurrency STEP: Background + Step function Background constant traffic Bursty, hour long bursts BUGGY: Increasing aggregate req. rate

Results Varied SLO thresholds to explore effect on induced models To eval accuracy of models under varying conditions Trained and evaled TAN classifier for each of 31 different SLO definitions Baseline: accuracy of 60-pctile SLO classifier (MOD) and CPU as metric.

Results Overall BA of TAN is 87-94% 90+% for all experiments 6% False alarm for 2 experiments, 17% for BUGGY Single metric is not sufficient to capture pattern of SLO violations (CPU) Small number of metrics is sufficient to capture pattern (3-8) Sensitive to workload and SLO definition (MOD always has high detection rate, but generate false alarms at increasing rate as SLO thresh increases)

Conclusion TANs are attractive for self-managing systems Build system models automatically No a priori knowledge required Generalizes to wide range of conditions Zeroes in on most relevant metrics Practical

Conclusion Possible work to adapt this to changing conditions Close the loop for automated diagnosis and control Ultimately most successful model is a hybrid of Automatically induced models A priori models

Questions?