A. Darwiche Sensitivity Analysis in Bayesian Networks Adnan Darwiche Computer Science Department

Slides:

Advertisements

Similar presentations

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.

Advertisements

BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

A. Darwiche Sensitivity Analysis in Bayesian Networks Adnan Darwiche Computer Science Department

Lahore University of Management Sciences, Lahore, Pakistan Dr. M.M. Awais- Computer Science Department 1 Lecture 12 Dealing With Uncertainty Probabilistic.

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,

On the Revision of Probabilistic Beliefs using Uncertain Evidence Hei Chan and Adnan Darwiche UCLA Presented by: Valerie Sessions October 6, 2004.

BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.

Probabilistic Reasoning with Uncertain Data Yun Peng and Zhongli Ding, Rong Pan, Shenyong Zhang.

A. Darwiche Learning in Bayesian Networks. A. Darwiche Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown.

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.

Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.

Localized Techniques for Power Minimization and Information Gathering in Sensor Networks EE249 Final Presentation David Tong Nguyen Abhijit Davare Mentor:

Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.

Bayesian Belief Networks

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.

5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.

Prof. Bart Selman Module Probability --- Part e)

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Thanks to Nir Friedman, HU

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Aspects of Bayesian Inference and Statistical Disclosure Control in Python Duncan Smith Confidentiality and Privacy Group CCSR University of Manchester.

Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.

Rule Generation [Chapter ]

Evidence and scenario sensitivities in naïve Bayesian classifiers Presented by Marwan Kandela & Rejin James 1 Silja Renooij, Linda C. van der Gaag, "Evidence.

EM and expected complete log-likelihood Mixture of Experts

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

Performing Bayesian Inference by Weighted Model Counting Tian Sang, Paul Beame, and Henry Kautz Department of Computer Science & Engineering University.

CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.

MCS 312: NP Completeness and Approximation algorithms Instructor Neelima Gupta

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

Non-Informative Dirichlet Score for learning Bayesian networks Maomi Ueno and Masaki Uto University of Electro-Communications, Japan 1.Introduction: Learning.

Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))

Introduction to Bayesian Networks

An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati

1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.

Classification Techniques: Bayesian Classification

Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.

Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Uncertainty Management in Rule-based Expert Systems

1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.

A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.

Slides for “Data Mining” by I. H. Witten and E. Frank.

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.

LIMITATIONS OF ALGORITHM POWER

6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,

Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.

Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 

Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.

CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Learning in Bayesian Networks. Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete.

Håkan L. S. YounesDavid J. Musliner Carnegie Mellon UniversityHoneywell Laboratories Probabilistic Plan Verification through Acceptance Sampling.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

Copyright © Cengage Learning. All rights reserved. 1 Equations, Inequalities, and Mathematical Modeling.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Data Mining Lecture 11.

Bayesian Models in Machine Learning

Machine Learning: Lecture 6

Presentation transcript:

A. Darwiche Sensitivity Analysis in Bayesian Networks Adnan Darwiche Computer Science Department

A. Darwiche

Ex: How will this parameter change affect the query Pr(BP = Low | HR = Low)? How will this parameter change impact the value of an arbitrary query Pr(y|e)?

A. Darwiche Now Pr(BP = Low | HR = Low) = 0.69, but experts believe it should be >= How do we efficiently identify minimal parameter changes that can help us satisfy this query constraint?

A. Darwiche

How do we choose among these candidate parameter changes? How do we measure and quantify change? Absolute? Relative? Or something else?

A. Darwiche Naïve Bayes classifier Class variable C Attributes E PPr(p) yes0.87 no0.13 PUPr(u|p) yes-ve0.27 no+ve0.107 PBPr(b|p) yes-ve0.36 no+ve0.106 PSPr(s|p) yes-ve0.10 no+ve0.01

A. Darwiche System Schematic P: A power supply for whole system T: Transmitter: generates to bus (generates no data when faulty) R: Receiver: receives data if available on bus S: Sensor: reflects status of data on receiver (fails low) Reliability of each component is shown next to it T RRR P SSS

A. Darwiche

Now Pr(BP = Low | HR = Low) = 0.69, but experts believe it should be >= How do we efficiently identify minimal parameter changes that can help us satisfy this query constraint?

A. Darwiche From global to local belief change Goal: for each network parameter  x|u, compute the minimal amount of change that can enforce query constraints such as: 1.Pr(y|e) . 2.Pr(y|e) – Pr(z|e) . 3.Pr(y|e) / Pr(z|e) .

A. Darwiche Computing the partial derivative For each network parameter  x|u, we introduce a meta parameter  x|u, such that: Our procedure requires us to first compute the partial derivative  Pr(e) /   x|u :

A. Darwiche Result: to ensure constraint Pr’(y|e) , we need to change the meta parameter  x|u by  such that: Parameter changes The solution is either   q or   q. If the change is invalid, the parameter is irrelevant. We have to compute the solution of  for every parameter in the Bayesian network.

A. Darwiche Time complexity: O(n 2 w ), where n is the number of variables, and w is the treewidth. This time complexity is the same as performing inference to find Pr(e), the simplest query possible. Therefore, this procedure is very efficient, since we are not paying any penalty by computing the partial derivatives. Complexity

A. Darwiche Reasoning about Information Understand the impact of new information on situation assessment and decision making What new evidence would it take, and how reliable should it be, to confirm a particular hypothesis?

A. Darwiche False positive is 10% False negative is 30%

A. Darwiche

Get a better scanning test: false negative from 10% to 4.6% Other tests irrelevant! Need probability of pregnancy to be <= 5% given three tests are negative

A. Darwiche Current Work… Multiple parameters: –Same CPTs –Multiple CPTs Multiple constraints

A. Darwiche How will this parameter change impact the value of an arbitrary query Pr(y|e)?

A. Darwiche Bounding the partial derivative Key result: the network-independent bound on the partial derivative:

A. Darwiche Plot of the bound

A. Darwiche We apply an infinitesimal change   x|u to the meta parameter  x|u, leading to a change of  Pr(y|e) (we assume  x|u  0.5). Result: the relative change in the query Pr(y|e) is at most double the relative change in the meta parameter  x|u : Bounding query change due to infinitesimal parameter change

A. Darwiche Bounding query change due to arbitrary parameter change If the change in  x|u is positive: If the change in  x|u is negative: Combining both results, we have:

A. Darwiche Bounding query change: example Tampering  True0.02 False0.98 Tampering ’’ True0.036 False0.964 = false = true Pr(fire | e) = Pr’(fire | e) = ? Pr’(fire | e) = [0.016, 0.053] Exact value: 0.021

A. Darwiche Permissible changes for Pr(x|u) to ensure robustness The graphs analytically show that the permissible changes are smaller for non-extreme queries. Moreover, we can afford more absolute changes for non-extreme parameters. Pr(y|e) = 0.9 Pr(y|e) = 0.6

A. Darwiche A probabilistic distance measure D(Pr, Pr’) satisfies the three properties of distance: positiveness, symmetry, and the triangle inequality.

A. Darwiche 0.10 c abc A Pr(A,B,C) BC a a a a a a a b b b b b b b c c c c c c Pr’(A,B,C) Pr’(w) / Pr(w) max Pr’(w) / Pr(w)min Pr’(w) / Pr(w)

A. Darwiche Distance Properties Measure satisfies properties of distance: 1.Positiveness: D(Pr, Pr’) >= 0, and D(Pr, Pr’) = 0 iff Pr = Pr’ 2.Symmetry: D(Pr, Pr’) = D(Pr’, Pr) 3.Triangle inequality: D(Pr, Pr’) + D(Pr’, Pr’’) >= D(Pr, Pr’’)

A. Darwiche Significance of distance measure Let: –Pr and Pr’ be two distributions. –  and  be any two events. –Odds of  |  under Pr: O(  |  ). –Odds of  |  under Pr’: O’(  |  ). Key result: we have the following tight bound:

A. Darwiche Bounding belief change Given Pr and Pr’: –p = Pr(  |  ) –d = D(Pr, Pr’) What can we say about Pr’(  |  )?

A. Darwiche Bounding belief change d = 0.1d = 1

A. Darwiche Comparison with KL-divergence 1.KL-divergence is incomparable with our distance measure. 2.We cannot use KL-divergence to guarantee a similar bound provided by our distance measure. 3.Our recent work suggests that KL-divergence can be viewed as an average-case bound, and our distance measure as a worst-case bound.

A. Darwiche What is the global impact of this local parameter change? parameter change Applications to Bayesian networks

A. Darwiche N’ is obtained from N by changing the CPT of X from  X|u to  ’ X|u. N and N’ induce distributions Pr and Pr’. Distance between networks

A. Darwiche Bounding query change: example Tampering  True0.02 False0.98 Tampering ’’ True0.036 False0.964 = false = true Pr(fire | e) = Pr’(fire | e) = ? Pr’(fire | e) = [0.016, 0.053]

A. Darwiche Jeffrey’s Rule Given distribution Pr: Given soft evidence:

A. Darwiche Jeffrey’s Rule If w is a world that satisfies  i :

A. Darwiche Example of Jeffrey’s Rule A piece of cloth: Its color can be one of: green, blue, violet. May be sold or unsold the next day. Green Sold Not sold0.18 Blue Sold Not Sold0.18 Violet Sold Not Sold0.08

A. Darwiche Example of Jeffrey’s Rule Given soft evidence: green: 0.7, blue: 0.25, violet: 0.05 Green Sold Not sold0.18 Blue Sold Not Sold0.18 Violet Sold Not Sold0.08  0.7 / 0.3  0.25 / 0.3  0.05 / 0.4 Green Sold Not sold0.42 Blue Sold Not Sold0.15 Violet Sold Not Sold0.01

A. Darwiche Bound on Jeffrey’s Rule Distance between Pr and Pr’: Bound on the amount of belief change:

A. Darwiche From Numbers to Decisions + Probabilistic Inference 0.87yes 0.13no Pr(p)P 0.27-veyes no P ve Pr(u|p)U 0.36-veyes no P ve Pr(b|p)B 0.10-veyes no P 0.01+ve Pr(s|p)S Pregnant? (P) Urine test (U) Blood test (B) Scanning test (S) Decision Function Test results: U, B, S Yes, No

A. Darwiche U +ve -ve B S Yes +ve -ve No -ve +ve Situation: U=+ve, B=-ve, S=-ve 0.87yes 0.13no Pr(p)P 0.27-veyes no P ve Pr(u|p)U 0.36-veyes no P ve Pr(b|p)B 0.10-veyes no P 0.01+ve Pr(s|p)S Pregnant? (P) Urine test (U) Blood test (B) Scanning test (S) Ordered Decision Diagram + Probabilistic Inference From Numbers to Decisions

A. Darwiche X1 X2 X3 1 0 Binary Decision Diagram Test-once property

A. Darwiche Improving Reliability of Sensors Currently False negative 27.0% False positive 10.7% Pregnant? (P) Urine test (U) Blood test (B) Scanning test (S) Same decisions (in all situations) if new test is: False negative 10% False positive 5% Different decisions (in some situations) if new test: False negative 5% False positive 2.5% Can characterize these situations, compute their likelihood, analyze their properties Yes if > 90%

A. Darwiche Adding New Sensors Pregnant? (P) Urine test (U) Blood test (B) Scanning test (S) New test (N) Can characterize these situations, compute their likelihood, analyze their properties Same decisions (in all situations) if: False negative 40% False positive 20% Different (in some situations) decisions if: False negative 20% False positive 10% Yes if > 90%

A. Darwiche Equivalence of NB classifiers Equivalent iff prior of P in F N’  [0.684, 0.970) Change prior of P

A. Darwiche Equivalence of NB classifiers

A. Darwiche Path 1Path 2 Sub-ODD D 1 Sub-ODD D 2

A. Darwiche Path 1Path 2 Sub-ODD D 1 = D 2

A. Darwiche Theoretical results of algorithm Space complexity: –Total number of nodes in the ODD  O(b n/2 ) Time complexity: –O(nb n/2 ) Improves greatly over brute-force approach: –Total number of instances = O(b n )

A. Darwiche Experimental results of algorithm Network# Attributes# Instances# Nodes bound# Nodes Tic-tac-toe Votes Spect22 4  Breast-cancer-w9 1  Hepatitis19 2  Kr-vs-kp36 1  Mushroom22 1  

A. Darwiche

A. Darwiche