Presentation is loading. Please wait.

Presentation is loading. Please wait.

The State-Space Approach to Self-Management of Enterprise Systems Vibhore Kumar, Karsten Schwan Subu Iyer*, Yuan Chen*, Akhil Sahai* Georgia Institute.

Similar presentations


Presentation on theme: "The State-Space Approach to Self-Management of Enterprise Systems Vibhore Kumar, Karsten Schwan Subu Iyer*, Yuan Chen*, Akhil Sahai* Georgia Institute."— Presentation transcript:

1 The State-Space Approach to Self-Management of Enterprise Systems Vibhore Kumar, Karsten Schwan Subu Iyer*, Yuan Chen*, Akhil Sahai* Georgia Institute of Technology Hewlett-Packard labs*

2 Outline  Motivation: Enterprise Complexity  Issues  Solution Overview  Policy-Driven Self-Management  Dynamic SLA Decomposition  Results  Future Work

3 Enterprise Complexity: Some Facts  From a survey conducted by Forrester Research Enterprises now devote 80% of their overall IT budget to maintenance and ongoing operations More than half of the 347 participating companies used at least 3 database vendors A major banking-industry client had 18 different travel and expense systems in the organization  “VP of IT Governance” - says tons about the state of enterprise IT infrastructure

4 The Complexity Wall “If we don’t get a handle on complexity, it will stop the expansion” - Paul Horn, Senior Vice President, IBM Research “Our enterprise customers are working with enormous complexity” - Dick Lampman, Former Director, HP Labs

5 The Complexity Wall @  Worldspan, one of our industry collaborators, provides services to the travel industry  One of their airline ticket pricing/availability services is hosted on a farm of 1400 servers  In 2006 alone, they processed around 9.6 billion messages  Highly varying request rates and request type mix  Several behaviors of their system are not well understood Effects of Ticket Geography Effects of Cache Refresh Time Effects of Time of Day …

6 To Handle The Complexity…  One must enable self-management of complex enterprise infrastructures driven by high-level goals

7 Enterprise Self-Management: The Hurdles  Enterprise systems are too big The problem of Scale  It is tough to relate high-level goals to low- level actions The problem of Complex System Modeling  The operating environment is very dynamic The problem of Dynamism  Administrators find it hard to trust black-box solutions The problem of Trust & Tractability

8 Solution Overview: System State-Space Monitored System Variables (v 1, v 2, v 3, v 4, v 5, v 6, v 7, v 8, v 9, v 10, v 11, v 12, v 13, v 14, v 15,..............................,v n ) System State Space V = Variables of Interest V ø V, e.g. Response-Time, QoI Controllable Variables V α V, e.g. Allocated-Servers, Memory Monitored Component Variables  The aim is to establish a relation between V ø and V α under current operating conditions Enterprise System

9 Simple Automated Operation  SLO: “Response Time < 10msec” Event: SLO Violation Condition: Bandwidth=90Mbps, Request Rate=30 Action: set Allocated Servers to 3 1908 Allocated Servers Request RateResponse Time 123 9 : V α V ø given V – (V α U V ø ) 12 Bandwidth 30 VαVα VøVø

10 Solution Overview: The Function  Learn from observed system states  But there are problems Different behavior in different sub-spaces Large state space, |V| ≈ 10 2 to 10 3 v 1 v 2............. v n Machine Learning CPU Bottleneck Network Bottleneck Observed System States

11 Solution Overview: The Function  We decided to model the system using multiple µ-models = { }  We intelligently partition the set of observed system states partitions exhibit homogenous behavior partitions have a reduced number of relevant variables  Partitioning & µ-Modeling solve two problems! The problem of Scale The problem of Complex System Modeling Reduced Number of Relevant Variables in a µ-model v 1 v 2............. v n

12 Solution Overview: µ-Models  We use Tree Augmented Naïve Bayes (TAN) Classifier to build µ-models  The model returns the following probability γ = Pr(V α | V desired )  Find assignment of values to variables in V α that maximizes the probability of moving the system to the desired state

13 Solution Approach: Dynamism  As the system keeps running more system states are generated, which could be incorporated into the µ-models  µ-models are easier to update as compared to monolithic system models  As a result of µ-model update Policy Invalidation Policy Adaptation New Policies can Result  This addresses the problem of Dynamism

14 Solution Approach: Tractability & Trust  Each self-management action that assigns values to variables in V α is associated with a probability γ = Pr(V α | V – V ø )  An action is taken only when γ > γ threshold  This can be used to fine-tune self-management  TANs can be easily understood by administrators

15 Outline  Motivation: Enterprise Complexity  Issues  Solution Overview  Policy-Driven Self-Management  Dynamic SLA Decomposition  Results  Future Work

16 Policy-Driven Self-Management  SLO: “Response Time < 10msec” Event: SLO Violation Condition: Bandwidth=90Mbps, Request Rate=30 Given the goal state (90,30,9), find the µ-model to use Action: set Allocated Servers to 3 1908 Allocated Servers Request RateResponse Time 123 9 Bandwidth 30 (90,30,12) Current State (90,30,9) Goal State

17 Dynamic SLA Decomposition  Problem: To determine sub-SLAs for components that lead to SLA conformance  Sub-SLAs can be thought of as per-component range of values for controllable variables  If each component adheres to the sub-SLAs then the SLA is not violated  Our techniques can handle SLA decomposition System-Level SLA SLA 1 SLA 2 SLA 3 SLA 4 SLA 5 conformance(SLA 1, SLA 2, …, SLA n ) conformance(System SLA)

18 Experimental Results: SOA Simulator Without Self-Management With Self-Management

19 Experimental Results: RUBiS over VMs Without Self-Management With Self-Management Database Perturbation Partition Change

20 Conclusions & Future Work  Our techniques are applicable for a variety of enterprise systems  In our experiments the techniques have proven to be very scalable and accurate  Monitoring overheads can be reduced by taking inputs about relevant variables from the state-space partitions  Design & Implement techniques that can proactively avoid SLA violations

21 Thank You! References [1] V. Kumar, K. Schwan, S. Iyer, Y. Chen, A. Sahai. The state- space approach to SLA-based management. In submission to NOMS 2008. [2] V. Kumar, B. F. Cooper, G. Eisenhauer, K. Schwan. iManage: Policy-Driven Self-Management for Enterprise-Scale Systsem. Middleware 2007. [3] V. Kumar, B. F. Cooper, G. Eisenhauer, K. Schwan. Enabling Policy-Driven Self-Management for Enterprise Systems. PBAC 2007 in conjunction with ICAC-2007 [4] V. Kumar, et al. Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW. ICAC 2006


Download ppt "The State-Space Approach to Self-Management of Enterprise Systems Vibhore Kumar, Karsten Schwan Subu Iyer*, Yuan Chen*, Akhil Sahai* Georgia Institute."

Similar presentations


Ads by Google