Measurements Meir Kalech Partially Based on slides of Brian Williams and Peter struss.

Slides:

Advertisements

Similar presentations

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

MBD and CSP Meir Kalech Partially based on slides of Jia You and Brian Williams.

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

CHAPTER 13: Binomial Distributions

.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.

Naïve Bayes Classifier

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Overview Full Bayesian Learning MAP learning

. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.5 [P]: Propositions and Inference Sections.

Visual Recognition Tutorial

1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.

1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.

Thanks to Nir Friedman, HU

. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.

Introduction to Model- Based Diagnosis Meir Kalech Partially based on the slides of Peter Struss.

Crash Course on Machine Learning

Bayesian Decision Theory Making Decisions Under uncertainty 1.

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Chapter 10 Hypothesis Testing

Chapter 5 Sampling Distributions

ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.

Lab Assignment 1 Environments Search Bayes Nets. Problem 1: Peg Solitaire Is Peg Solitaire: Partially observable? Stochastic? Continuous? Adversarial?

STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)

1 Let X represent a Binomial r.v,Then from => for large n. In this context, two approximations are extremely useful. (4-1) 4. Binomial Random Variable.

.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Text Classification, Active/Interactive learning.

1 TMS and ATMS Philippe Dague and Yuhong YAN NRC-IIT

Visibility Graph. Voronoi Diagram Control is easy: stay equidistant away from closest obstacles.

Lecture 15: Statistics and Their Distributions, Central Limit Theorem

1 Sections 1.5 & 3.1 Methods of Proof / Proof Strategy.

BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.

1 Let X represent a Binomial r.v as in (3-42). Then from (2-30) Since the binomial coefficient grows quite rapidly with n, it is difficult to compute (4-1)

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology.

Classification Techniques: Bayesian Classification

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.

Lecture 2: Statistical learning primer for biologists

Inference Algorithms for Bayes Networks

1 Binomial Random Variables Lecture 5  Many experiments are like tossing a coin a fixed number of times and recording the up-face.  The two possible.

Machine Learning 5. Parametric Methods.

Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.

AP Statistics From Randomness to Probability Chapter 14.

Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.

CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.

Bayesian Estimation and Confidence Intervals Lecture XXII.

Chapter 7. Classification and Prediction

Bayesian Estimation and Confidence Intervals

CHAPTER 14: Binomial Distributions*

Chapter 4 Probability.

Model-based Diagnosis: The Single Fault Case

Chapter 5 Sampling Distributions

Data Mining Lecture 11.

6-1 Introduction To Empirical Models

Covered only ML estimator

Honors Statistics From Randomness to Probability

Chapter 5 Sampling Distributions

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Presentation transcript:

Measurements Meir Kalech Partially Based on slides of Brian Williams and Peter struss

Outline  Last lecture: 1. Justification-based TMS 2. Assumption-based TMS Consistency-based diagnosis  Today’s lecture: 1. Generation of tests/probes 2. Measurement Selection 3. Probabilities of Diagnoses

Generation of tests/probes  Test: test vector that can be applied to the system assumption: the behavior of the component does not change between tests approaches to select the test that can discriminate between faults of different components (e.g. [Williams])  Probe: selection of the probe based on: predictions generated by each candidate on unknown measurable points cost/risk/benefits of the different tests/probes fault probability of the various components

Generation of tests/probes (II) Approach based on entropy [deKleer, 87, 92] A-priori probability of the faults (even a rough estimate) Given set D1, D2,... Dn of candidates to be discriminated 1. Generate predictions from each candidate 2. For each probe/test T, compute the a-posteriori probability p(Di|T(x)), for each possible outcome x of T 3. Select the test/probe for which the distribution p(Di|T(x)) has a minimal entropy (this is the test that on average best discriminates between the candidates)

A Motivating Example  Minimal diagnoses: {M 1 }, {A 1 }, {M 2, M 3 }, {M 2, A 2 }  Where to measure next? X,Y, or Z?  What measurement promises the most information?  Which values do we expect?  Minimal diagnoses: {M 1 }, {A 1 }, {M 2, M 3 }, {M 2, A 2 }  Where to measure next? X,Y, or Z?  What measurement promises the most information?  Which values do we expect? X Y Z M1M1 M2M2 M3M3 * * * A1A1 A2A2 + + F G 2 A BCDBCD E 3

Outline  Last lecture: 1. Justification-based TMS 2. Assumption-based TMS Consistency-based diagnosis  Today’s lecture: 1. Generation of tests/probes 2. Measurement Selection 3. Probabilities of Diagnoses

Measurement Selection - Discriminating Variables X Y Z M1M1 M2M2 M3M3 * * * A1A1 A2A2 + + F G 2 A BCDBCD E 3  Suppose: single faults are more likely than multiple faults  Probes that help discriminating {M 1 } and {A 1 } are most valuable  Suppose: single faults are more likely than multiple faults  Probes that help discriminating {M 1 } and {A 1 } are most valuable

? Discriminating Variables - Inspect ATMS Labels! 6 {{M 1 }} {{}} { } 12 {{ }} 4 {{M 2, A 1 } {M 3, A 1, A 2 }} {{M 2 }} 6 {{M 1,A 1 }} 4 6 {{M 3 }} 8 {{M 1, A 1, A 2 }} X Y Z M1M1 M2M2 M3M3 * * * A1A1 A2A2 + + F G 2 A BCDBCD E 3  Observations: Facts - not based on any assumption: Node has empty environment (as the only minimal one):  always derivable  Note the difference: empty label  node not derivable!  Observations: Facts - not based on any assumption: Node has empty environment (as the only minimal one):  always derivable  Note the difference: empty label  node not derivable! {{M 3, A 2 }{M 2 }} 6 ? Justification :{A,C} Justification :{B,D} Justification :{C,E} Empty label A1=10 and M1=6  M2=4 A2=12 and M2=4  M3=8 A2=12 and M3=6  M2=6 A1=10 and M2=6  M1=4 A1=10 and M2(depends on M3 and A2)=6  M1=4

Fault Predictions  No fault models used  Nevertheless, fault hypotheses make predictions!  E.g. diagnosis {A 1 } implies OK(M 1 )  OK(M 1 ) implies x=6  No fault models used  Nevertheless, fault hypotheses make predictions!  E.g. diagnosis {A 1 } implies OK(M 1 )  OK(M 1 ) implies x=6 6 {{M 1 }} {{}} { } 12 {{ }} 4 {{M 2, A 1 } {M 3, A 1, A 2 }} {{M 2 }} 6 {{M 1,A 1 }} 4 6 {{M 3 }} 8 {{M 1, A 1, A 2 }} X Y Z M1M1 M2M2 M3M3 * * * A1A1 A2A2 + + F G 2 A BCDBCD E 3 {{M 3, A 2 }{M 2 }} 6 If we measure x and concludes x=6 then we can infer that A1 is the diagnosis rather than M1

Predictions of Minimal Fault Localizations  ATMS Labels:  X  6: M 1 is broken.  X = 6 : {A 1 } only single fault  Y or Z same for {A 1 }, {M 1 }  X best measurement.  X  6: M 1 is broken.  X = 6 : {A 1 } only single fault  Y or Z same for {A 1 }, {M 1 }  X best measurement. X=4  M1 is diagnosis, since it appears only in x=6

Outline  Last lecture: 1. Justification-based TMS 2. Assumption-based TMS Consistency-based diagnosis  Today’s lecture: 1. Generation of tests/probes 2. Measurement Selection 3. Probabilities of Diagnoses

Probabilities of Diagnoses  Fault probability of component(type)s: p f  For instance, p f (C i ) = 0.01 for all C i  {A 1, A 2, M 1, M 2, M 3 }  Normalization by  =  p(FaultLoc) FaultLoc  Fault probability of component(type)s: p f  For instance, p f (C i ) = 0.01 for all C i  {A 1, A 2, M 1, M 2, M 3 }  Normalization by  =  p(FaultLoc) FaultLoc X Y Z M1M1 M2M2 M3M3 * * * A1A1 A2A2 + + F G 2 A BCDBCD E 3

Probabilities of Diagnoses - Example  Assumption: independent faults  Heuristic: minimal fault localizations only  Assumption: independent faults  Heuristic: minimal fault localizations only Minimal fault localization p( FaultLoc )/  Prediction X Y Z {M 1 } {A 1 } {M 2, A 2 } {M 2,M 3 }

Entropy-based Measurement Proposal Entropy of a Coin toss as a function of the probability of it coming up heads

 The cost of locating a candidate with probability p i is log(1/p i ) (binary search through 1/p i objects).  Meaning, needed cuts to find an object. Example: p(x)=1/25  the number of cuts in binary search will be log(25) = 4.6 p(x)=1/2  the number of cuts in binary search will be log(2) = 1  p i is the probability of C i being actual candidate given a measurement outcome. The Intuition Behind the Entropy

 The cost of identifying the actual candidate, by the measurement is: 1. p i  0  occur infrequently, expensive to find  p i log(1/p i )  0 2. p i  1  occur frequently, easy to find  p i log(1/p i )  0 3. p i in between  p i log(1/p i )  1 The Intuition Behind the Entropy Go over through the possible candidates The probability of candidate Ci to be faulted given an assignment to the measurement The cost of searching for this probability

 The expected entropy by measuring Xi is:  Intuition: the expected entropy of X = ∑ the probability of Xi * entropy of Xi  This formula is an approximation of the above: The Intuition Behind the Entropy Go over through the possible outcomes of measurement Xi The probability of measurement Xi to be Vik The entropy if Xi=Vik m

 This formula is an approximation of the above:  Where, U i is the set of candidates which do not predict any value for x i  The goal is to find measurement x i that minimizes the above function The Intuition Behind the Entropy m

Entropy-based Measurement Proposal - Example Proposal: Measure variable which minimizes the entropy: X x=6 under the diagnoses: {A1}, {M2,A2}, {M2,M3} =0.505 x=6 under the diagnoses: {A1}, {M2,A2}, {M2,M3} =0.505

 How to update the probability of a candidate?  Given measurement outcome x i =u ik, the probability of a candidate is computed via Bayes’ rule:  Meaning: the probability that C l is the actual candidate given the measurement x i =u ik.  p(C l ) is known in advance. Computing Posterior Probability Normalization factor: The probability that x i = u ik : the sum of the probabilities of the Candidates consistent with this measurment

 How to compute p(x i =u ik |C l )? Three cases: 1. If the candidate C l predicts the output x i =u ik then p(x i =u ik |C l )=1 2. If the candidate C l predicts the output x i ≠ u ik then p(x i =u ik |C l )=0 3. If the candidate C l predicts no output for x i then p(x i =u ik |C l )=1/m (m is number of possible values for x) Computing Posterior Probability

Example  Initial probability of failure in inverter is  Assume the input a=1:  What is the best next measurement b or e?  Assume next measurement points on fault: Measuring closer to input produces less conflicts: b=1  A is faulty e=0  some components is faulty.

Example  On the other hand:  Measuring further away from the input is more likely to produce a discrepant value.  The large number of components the more likely that there is a fault.  the probability of finding a particular value outweighs the expected cost of isolating the candidate from a set.   the best next measurement is e.

Example H(b) = p(b = true | all diagnoses with observation a) log p(b = true | all diagnoses with observation a) + p(b = false | all diagnoses with observation a) log p(b = false | all diagnoses with observation a)

Example  Assume a=1 and e=0: Then the next best measurement is c. equidistant from previous measurements.  Assume a=1 and e=1 and p(A)=0.025: Then the next best measurement is b.