Chapter 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.

Slides:



Advertisements
Similar presentations
3 Copyright © 2005, Oracle. All rights reserved. Designing J2EE Applications.
Advertisements

Autonomous Recovery in Componentized Internet Application Candea et. al Vikram Negi.
Database Architectures and the Web
1 Chi-Square Test -- X 2 Test of Goodness of Fit.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Chapter 8 Hypothesis Testing
Hypothesis Testing IV Chi Square.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Enterprise Applications & Java/J2EE Technologies Dr. Douglas C. Schmidt Professor of EECS.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Soft. Eng. II, Spr. 2002Dr Driss Kettani, from I. Sommerville1 CSC-3325: Chapter 9 Title : Reliability Reading: I. Sommerville, Chap. 16, 17 and 18.
Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
The middleware that makes real time integration a reality.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
1 A Student Guide to Object- Orientated Development Chapter 9 Design.
AM Recitation 2/10/11.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Involving One Population.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Chapter 8 Hypothesis Testing I. Chapter Outline  An Overview of Hypothesis Testing  The Five-Step Model for Hypothesis Testing  One-Tailed and Two-Tailed.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Enterprise JavaBeans. What is EJB? l An EJB is a specialized, non-visual JavaBean that runs on a server. l EJB technology supports application development.
Two Variable Statistics
Copyright © 2012 by Nelson Education Limited. Chapter 10 Hypothesis Testing IV: Chi Square 10-1.
EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 20 Slide 1 Critical systems development 3.
Chapter Twelve Copyright © 2006 John Wiley & Sons, Inc. Data Processing, Fundamental Data Analysis, and Statistical Testing of Differences.
Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
EEC 688/788 Secure and Dependable Computing Lecture 8 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
EEC 688/788 Secure and Dependable Computing Lecture 10 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
© Copyright McGraw-Hill 2004
Building Dependable Distributed Systems, Copyright Wenbing Zhao
Progress Report Armando Fox with George Candea, James Cutler, Ben Ling, Andy Huang.
EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Chapter 13 Understanding research results: statistical inference.
EJB Enterprise Java Beans JAVA Enterprise Edition
CHAPTER 11 CHI-SQUARE TESTS
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
EEC 688/788 Secure and Dependable Computing
CHAPTER 3 Architectures for Distributed Systems
#01 Client/Server Computing
Chapter 12: Inference about a Population Lecture 6b
Elementary Statistics: Picturing The World
EEC 688/788 Secure and Dependable Computing
Chapter 11: Inference for Distributions of Categorical Data
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Overview and Chi-Square
EEC 688/788 Secure and Dependable Computing
Web Application Server 2001/3/27 Kang, Seungwoo. Web Application Server A class of middleware Speeding application development Strategic platform for.
EEC 688/788 Secure and Dependable Computing
CHAPTER 11 CHI-SQUARE TESTS
Component-based Applications
EEC 688/788 Secure and Dependable Computing
Statistics II: An Overview of Statistics
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Chapter Outline Goodness of Fit test Test of Independence.
#01 Client/Server Computing
Presentation transcript:

Chapter 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems Building Dependable Distributed Systems, Copyright Wenbing Zhao 1

Wenbing Zhao Outline Recovery oriented computing  Overview  Application level fault detection Structural behavior monitoring Path shape analysis

Recovery-Oriented Computing On availability of soft realtime systems  Availability = MTTF/(MTTF+MTTR)  MTTF: mean time to failure  MTTR: mean time to recover  Availability can be improved by increasing MTTF as well as reducing MTTR Recovery-oriented computing: focusing on reducing MTTR  Making fault detection faster and more accurate  Making recovery faster Building Dependable Distributed Systems, Copyright Wenbing Zhao 3

Fault Detection and Localization Fault detection: determine if some component in the system has failed Fault localization: pinpoint the particular component that failed Low-level fault detection mechanism  Based on timeout, probing each component periodically with a heartbeat message  Cannot detect many application-level faults Recovery-oriented computing: focusing on application- level fault detection and localization  75% of the recovery time is spent on application-level fault detection Building Dependable Distributed Systems, Copyright Wenbing Zhao 4

Microreboot and System-Level Undo/Redo Microreboot: many problems can be fixed by simply restarting the faulty component  Works best with component-based systems For problems cannot be fixed by microreboot, performs system-level undo, fixed the problem, then carries out system-level redo  Based on checkpointing and logging Building Dependable Distributed Systems, Copyright Wenbing Zhao 5

System Model for Recovery- Oriented Computing Three-tier architecture  Separating application logic and data management  Middle-tier is stateless or maintains only session state Component-based middleware  Java Platform, Enterprise Edition (Java EE often referred to as J2EE)  Key component: Enterprise Java Bean (EJB) Building Dependable Distributed Systems, Copyright Wenbing Zhao 6

Application-Level Fault Detection Fail-stop faults can be detected using timeouts Application-level faults can only be detected in the application level One plausible fault detection method: acceptance test  Developer would have to develop effective and efficient acceptance test routings  Not practical for Internet apps due to their scale, complexity and rapid rate of changes ROC-based approach: measure and monitor structural behaviors of an app  May detect app-level faults without a priori knowledge of the app details Building Dependable Distributed Systems, Copyright Wenbing Zhao 7

Structural Behavior Monitoring Interaction patterns between different components reflect the app-level functionality  Each component implements a specific app function, e.g., Stateful session bean to manage a user’s shopping cart A set of singleton session beans to keep track of inventory  The internal structural behavior can be monitored to infer whether or not the app is functioning normally  To monitor Log runtime path for each end-user request, including all incoming msgs, outgoing msgs, method invocations, etc. Building Dependable Distributed Systems, Copyright Wenbing Zhao 8

Structural Behavior: Runtime Path Example Runtime path for a single end-user request  Span 5 components  Consist of 10 events Building Dependable Distributed Systems, Copyright Wenbing Zhao 9

Structural Behavior: Machine Learning Train reference models using machine learning Historical reference model: training with aggregated runtime path data  Objective: anomaly detection based on historical behavior  May use real workload as well as synthetic workload that resembles real workload Peer reference model: train with most recent runtime path data  Objective: anomaly detection with respect to the peer components  Must train with real workload Fault (anomaly) detection: comparing observed patterns with those in the reference models Building Dependable Distributed Systems, Copyright Wenbing Zhao 10

Component Interactions Modeling Focus on interactions between a component instance and all other component classes  More scalable: can cope with cases when there are many instances of each class  Suitable for using the Chi-square test for anomaly detection Building Dependable Distributed Systems, Copyright Wenbing Zhao 11

Component Interactions Modeling Given a system with n component classes, the interaction model for a component instance consists of a set of n-1 weighted links between the instance and all the other n-1 component classes  We assume instances of the same class do not interaction with each other  We assume that interactions are symmetric (i.e., request and reply)  Weight assigned to each link is the probability of the component instance intreracting with the linked component class  The sum of the weight on all links is 1, i.e., the component instance has probability of 1 to interact with other component classes Building Dependable Distributed Systems, Copyright Wenbing Zhao 12

Component Interaction Model: Example Class A: web component, handles end-user requests Class B: app logic, handles conversations with end- users, 3 instances Class C and Class D: also app logic, representing shared state Class E: database server, persistent state Building Dependable Distributed Systems, Copyright Wenbing Zhao 13

Component Interaction Model: Example Machine learning: determine link weight based on training data Training data  A issued 400 remote invocations on b1  b1 issued 300 local method invocations on C, and 300 invocations on D  Not important what happened between C & E, D & E Link weight calculation  Total number interactions occurred at b1 instance: 1000  P(b1-A) = 400/1000 = 0.4  P(b1-C) = 300/1000 = 0.3  P(b1-D) = 300/1000 = 0.3 Building Dependable Distributed Systems, Copyright Wenbing Zhao 14

Anomaly Detection Comparison of current behavior with the trained behavior: use Chi-Square test  Prepare the observed data as a histogram  Compare distribution using formula: n: number of cells in the histogram ei: expected frequency in cell i oi: observed frequency in cell i If ei is 0, the cell should be pruned off Each link is regarded as a cell For observation period of m requests, expected frequency for link i: ei = m * pi No anomaly: D = 0 ideally. In practice, D is not 0 due to randomness, it follows a chi-square distribution Building Dependable Distributed Systems, Copyright Wenbing Zhao 15

Anomaly Detection: Chi-Square Test Anomaly detected: D > the 1-  quantile of the chi-square distribution with freedom of degree of k=n-1 at a level of significance  Higher level of  => more sensitive => more false positive Level of significance: the probability of rejecting the null hypothesis in a statistical test when it is true Building Dependable Distributed Systems, Copyright Wenbing Zhao 16

Anomaly Detection: Chi-Square Test: Example Observation period: 100 requests A issued 45 requests on b1 b1 issued 35 invocations on C, and 20 invocations on D Link(A-b1): expected value is 100*0.4=40, observed 45 Link(C-b1): expected: 100*0.3=30, observed 35 Link(D-b1): expected: 100*0.3, observed 20 D=(45-40) 2 /40 + (35-30) 2 /30+(20-30) 2 /30 = 4.79 Chi-square test: degree of freedom is 2 (only 3 cells), for  =0.1, 90% quantile is 4.6 => anomaly detected Building Dependable Distributed Systems, Copyright Wenbing Zhao 17

Path Shapes Modeling The shape of a runtime path is defined to be the ordered set of component classes A path shape is represented as a tree in which a node represents a component class  The directional edge represents the causal relationship between two adjacent nodes Building Dependable Distributed Systems, Copyright Wenbing Zhao 18

Path Shapes Modeling The probabilistic context-free grammar (PCFG) is used for path shape modeling (in Chomsky Normal Form, CNF)  A list of terminal symbols, Tk, component classes in a path shape form Tk  A list of nonterminal symbols, Ni Denote the stages of the production rules N1: start symbol, often denoted as S $: the end of a rule All other nonterminal symbols are to be replaced by production rules (see below)  A list of production rules, N i ->  j  a list of terminals and nonterminals)  A list of probabilities R ij = P(N i ->  j ) Building Dependable Distributed Systems, Copyright Wenbing Zhao 19

Path Shape Modeling: Example Path shape for 4 end-user requests 100% probability for the call to transit from A to B  R 1j : S  A, p=1.0  R 2j : A  B, p=1.0 Building Dependable Distributed Systems, Copyright Wenbing Zhao 20

Path Shape Modeling: Example For B, 3 possible transitions: to C with 25%, to D with 25%, and to both C&D with 50 probability  R 3j : B  C, p=0.25 | B  D, p=0.25 | B  CD, p=0.5 Once a call reaches C or D, it must transit to E, hence:  R 4j : C  E, p=1.0  R 5j : D  E, p=1.0 E is the last stop for all  R 5j : E  $, p=1.0 Building Dependable Distributed Systems, Copyright Wenbing Zhao 21

Path Shape Modeling: Anomaly Detection The path shape of new requests can be judged to see if they confirm to the grammar An anomaly is detected if a path shape does not conform to the grammar PCFG itself only detect fault, but not pinpoint root cause (localization of fault)  Need to use other method, such as decision tree Building Dependable Distributed Systems, Copyright Wenbing Zhao 22