How do we choose among these candidate parameter changes? How do we measure and quantify change? Absolute? Relative? Or something else?
A. Darwiche Naïve Bayes classifier Class variable C Attributes E PPr(p) yes0.87 no0.13 PUPr(u|p) yes-ve0.27 no+ve0.107 PBPr(b|p) yes-ve0.36 no+ve0.106 PSPr(s|p) yes-ve0.10 no+ve0.01
A. Darwiche System Schematic P: A power supply for whole system T: Transmitter: generates to bus (generates no data when faulty) R: Receiver: receives data if available on bus S: Sensor: reflects status of data on receiver (fails low) Reliability of each component is shown next to it T RRR P SSS.99.999.995 1 23.99
Now Pr(BP = Low | HR = Low) = 0.69, but experts believe it should be >= 0.75. How do we efficiently identify minimal parameter changes that can help us satisfy this query constraint?
A. Darwiche From global to local belief change Goal: for each network parameter x|u, compute the minimal amount of change that can enforce query constraints such as: 1.Pr(y|e) . 2.Pr(y|e) – Pr(z|e) . 3.Pr(y|e) / Pr(z|e) .
A. Darwiche Computing the partial derivative For each network parameter x|u, we introduce a meta parameter x|u, such that: Our procedure requires us to first compute the partial derivative Pr(e) / x|u :
A. Darwiche Result: to ensure constraint Pr’(y|e) , we need to change the meta parameter x|u by such that: Parameter changes The solution is either q or q. If the change is invalid, the parameter is irrelevant. We have to compute the solution of for every parameter in the Bayesian network.
A. Darwiche Time complexity: O(n 2 w ), where n is the number of variables, and w is the treewidth. This time complexity is the same as performing inference to find Pr(e), the simplest query possible. Therefore, this procedure is very efficient, since we are not paying any penalty by computing the partial derivatives. Complexity
A. Darwiche Reasoning about Information Understand the impact of new information on situation assessment and decision making What new evidence would it take, and how reliable should it be, to confirm a particular hypothesis?
A. Darwiche False positive is 10% False negative is 30%
A. Darwiche We apply an infinitesimal change x|u to the meta parameter x|u, leading to a change of Pr(y|e) (we assume x|u 0.5). Result: the relative change in the query Pr(y|e) is at most double the relative change in the meta parameter x|u : Bounding query change due to infinitesimal parameter change
A. Darwiche Bounding query change due to arbitrary parameter change If the change in x|u is positive: If the change in x|u is negative: Combining both results, we have:
A. Darwiche Permissible changes for Pr(x|u) to ensure robustness The graphs analytically show that the permissible changes are smaller for non-extreme queries. Moreover, we can afford more absolute changes for non-extreme parameters. Pr(y|e) = 0.9 Pr(y|e) = 0.6
A. Darwiche A probabilistic distance measure D(Pr, Pr’) satisfies the three properties of distance: positiveness, symmetry, and the triangle inequality.
A. Darwiche 0.10 c abc A Pr(A,B,C) BC a a a a a a a b b b b b b b c c c c c c 0.20 0.25 0.05 0.10 0.15 Pr’(A,B,C) 0.20 0.30 0.10 0.05 0.10 0.05 0.10 Pr’(w) / Pr(w) 2.00 1.50 0.40 1.00 2.00 0.50 1.00 0.67 max Pr’(w) / Pr(w)min Pr’(w) / Pr(w)
A. Darwiche Significance of distance measure Let: –Pr and Pr’ be two distributions. – and be any two events. –Odds of | under Pr: O( | ). –Odds of | under Pr’: O’( | ). Key result: we have the following tight bound:
A. Darwiche Bounding belief change Given Pr and Pr’: –p = Pr( | ) –d = D(Pr, Pr’) What can we say about Pr’( | )?
A. Darwiche Bounding belief change d = 0.1d = 1
A. Darwiche Comparison with KL-divergence 1.KL-divergence is incomparable with our distance measure. 2.We cannot use KL-divergence to guarantee a similar bound provided by our distance measure. 3.Our recent work suggests that KL-divergence can be viewed as an average-case bound, and our distance measure as a worst-case bound.
A. Darwiche What is the global impact of this local parameter change? parameter change Applications to Bayesian networks
A. Darwiche N’ is obtained from N by changing the CPT of X from X|u to ’ X|u. N and N’ induce distributions Pr and Pr’. Distance between networks
A. Darwiche Jeffrey’s Rule Given distribution Pr: Given soft evidence:
A. Darwiche Jeffrey’s Rule If w is a world that satisfies i :
A. Darwiche Example of Jeffrey’s Rule A piece of cloth: Its color can be one of: green, blue, violet. May be sold or unsold the next day. Green Sold0.12 0.3 Not sold0.18 Blue Sold0.12 0.3 Not Sold0.18 Violet Sold0.32 0.4 Not Sold0.08
A. Darwiche Example of Jeffrey’s Rule Given soft evidence: green: 0.7, blue: 0.25, violet: 0.05 Green Sold0.12 0.3 Not sold0.18 Blue Sold0.12 0.3 Not Sold0.18 Violet Sold0.32 0.4 Not Sold0.08 0.7 / 0.3 0.25 / 0.3 0.05 / 0.4 Green Sold0.28 0.7 Not sold0.42 Blue Sold0.1 0.25 Not Sold0.15 Violet Sold0.04 0.05 Not Sold0.01
A. Darwiche Bound on Jeffrey’s Rule Distance between Pr and Pr’: Bound on the amount of belief change:
A. Darwiche From Numbers to Decisions + Probabilistic Inference 0.87yes 0.13no Pr(p)P 0.27-veyes no P 0.107+ve Pr(u|p)U 0.36-veyes no P 0.106+ve Pr(b|p)B 0.10-veyes no P 0.01+ve Pr(s|p)S Pregnant? (P) Urine test (U) Blood test (B) Scanning test (S) Decision Function Test results: U, B, S Yes, No
A. Darwiche U +ve -ve B S Yes +ve -ve No -ve +ve Situation: U=+ve, B=-ve, S=-ve 0.87yes 0.13no Pr(p)P 0.27-veyes no P 0.107+ve Pr(u|p)U 0.36-veyes no P 0.106+ve Pr(b|p)B 0.10-veyes no P 0.01+ve Pr(s|p)S Pregnant? (P) Urine test (U) Blood test (B) Scanning test (S) Ordered Decision Diagram + Probabilistic Inference From Numbers to Decisions
A. Darwiche Improving Reliability of Sensors Currently False negative 27.0% False positive 10.7% Pregnant? (P) Urine test (U) Blood test (B) Scanning test (S) Same decisions (in all situations) if new test is: False negative 10% False positive 5% Different decisions (in some situations) if new test: False negative 5% False positive 2.5% Can characterize these situations, compute their likelihood, analyze their properties Yes if > 90%
A. Darwiche Adding New Sensors Pregnant? (P) Urine test (U) Blood test (B) Scanning test (S) New test (N) Can characterize these situations, compute their likelihood, analyze their properties Same decisions (in all situations) if: False negative 40% False positive 20% Different (in some situations) decisions if: False negative 20% False positive 10% Yes if > 90%
A. Darwiche Equivalence of NB classifiers Equivalent iff prior of P in F N’ [0.684, 0.970) Change prior of P
A. Darwiche Theoretical results of algorithm Space complexity: –Total number of nodes in the ODD O(b n/2 ) Time complexity: –O(nb n/2 ) Improves greatly over brute-force approach: –Total number of instances = O(b n )