Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Reasoning

Similar presentations


Presentation on theme: "Probabilistic Reasoning"— Presentation transcript:

1 Probabilistic Reasoning

2 Power of Cond. Independence
Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!! Conditional independence is the most basic & robust form of knowledge about uncertain environments. © Daniel S. Weld

3 Bayes Rule Simple proof from def of conditional probability: QED:
(Def. cond. prob.) (Def. cond. prob.) قانون بیز (Mult by P(H) in line 1) QED: (Substitute #3 in #2) © Daniel S. Weld

4 Use to Compute Diagnostic Probability from Causal Probability
E.g. let M be meningitis, S be stiff neck P(M) = , P(S) = 0.1, P(S|M)= 0.8 P(M|S) = © Daniel S. Weld

5 Bayes’ Rule & Cond. Independence
© Daniel S. Weld

6 Bayes Nets In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential space for representation & inference BNs provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update © Daniel S. Weld

7 BN What do the arrows really mean?
Topology may happen to encode causal structure Topology only guaranteed to encode conditional independence © Daniel S. Weld

8 P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn)
Independence If X1, X2,... Xn are mutually independent, then P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn) Joint can be specified with n parameters cf. the usual 2n-1 parameters required While extreme independence is unusual, Conditional independence is common BNs exploit this conditional independence © Daniel S. Weld

9 An Example Bayes Net P(E) 0.002 Earthquake Burglary Alarm Nbr1Calls
Pr(B=t)) 0.001 Earthquake Burglary Pr(A|E,B) e,b e,b e,b e,b Alarm Nbr1Calls Nbr2Calls A P(N1) T 0.9 F 0.05 A P(N2) T 0.7 F 0.01 © Daniel S. Weld

10 Earthquake Example (con’t)
Burglary Alarm Nbr2Calls Nbr1Calls If I know if Alarm, no other evidence influences my degree of belief in Nbr1Calls P(N1|N2,A,E,B) = P(N1|A) also: P(N2|N2,A,E,B) = P(N2|A) and P(E|B) = P(E) By the chain rule we have P(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)· P(A|E,B) ·P(E|B) ·P(B) = P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B) Full joint requires only 10 parameters (cf. 32) 10 = © Daniel S. Weld

11 BNs: Qualitative Structure
Graphical structure of BN reflects conditional independence among variables Each variable X is a node in the DAG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence “Markov Blanket” © Daniel S. Weld

12 Given Parents, X is Independent of Non-Descendants
© Daniel S. Weld

13 For Example Earthquake Burglary Radio Alarm Nbr1Calls Nbr2Calls
© Daniel S. Weld

14 Given Markov Blanket, X is Independent of All Other Nodes
MB(X) = Par(X)  Childs(X)  Par(Childs(X)) © Daniel S. Weld

15 Conditional Probability Tables
Pr(B=t) Pr(B=f) Earthquake Burglary Pr(A|E,B) e,b (0.1) e,b (0.8) e,b (0.15) e,b (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld

16 Conditional Probability Tables
For complete spec. of joint dist., quantify BN For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)| If X1, X2,... Xn is any topological sort of the network, then we are assured: P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1) … P(X2 | X1) · P(X1) = P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1) © Daniel S. Weld

17 Bayes Nets Representation Summary
Bayes nets compactly encode joint distributions Guaranteed independencies of distributions can be deduced from BN graph structure D-separation gives precise conditional independence guarantees from graph alone A Bayes’ net’s joint distribution may have further (conditional) independence that is not detectable until you inspect its specific distribution © Daniel S. Weld

18 Inference in BNs Inference: calculating some useful quantity from a joint probability distribution The graphical independence representation yields efficient inference schemes Posterior probability: Most likely explanation: Computations organized by network topology © Daniel S. Weld

19 P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a)
P(B | J=true, M=true) Earthquake Burglary Radio Alarm John Mary P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a) e a © Daniel S. Weld

20 Inference by Enumeration
Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them E.g. © Daniel S. Weld

21 In this simple method, we only need the BN to synthesize the joint entries
© Daniel S. Weld

22 Inference by Enumeration?
© Daniel S. Weld

23 Variable Elimination Why is inference by enumeration so slow?
You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Idea: interleave joining and marginalizing! Called “Variable Elimination” Still NP-hard, but usually much faster than inference by enumeration We’ll need some new notation to define VE 2 © Daniel S. Weld

24 Factor 1 Joint distribution: P(X,Y) Selected joint: P(x,Y)
Entries P(x,y) for all x, y Sums to 1 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) © Daniel S. Weld

25 Factor 2 Family of conditionals: P(X |Y) Single conditional: P(Y | x)
Multiple conditionals Entries P(x | y) for all x, y Sums to |Y| Single conditional: P(Y | x) Entries P(y | x) for fixed x, all y Sums to 1 © Daniel S. Weld

26 Specified family: P(y | X) Entries P(y | x) for fixed y, but for all x
Sums to … who knows! In general, when we write P(Y1 … YN | X1 … XM) It is a “factor,” a multi-dimensional array Its values are all P(y1 … yN | x1 … xM) Any assigned X or Y is a dimension missing (selected) from the array © Daniel S. Weld

27 Example: Traffic Domain
Random Variables R: Raining T: Traffic L: Late for class! First query: P(L) © Daniel S. Weld

28 Operation 1: Join Factors
First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable Build a new factor over the union of the variables involved Example: Join on R Computation for each entry: pointwise products © Daniel S. Weld

29 Example: Multiple Joins
© Daniel S. Weld

30 © Daniel S. Weld

31 Operation 2: Eliminate Second basic operation: marginalization
Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: © Daniel S. Weld

32 Multiple Elimination © Daniel S. Weld

33 P(L) : Marginalizing Early!
© Daniel S. Weld

34 Marginalizing Early-2 © Daniel S. Weld

35 Evidence If evidence, start with factors that select that evidence
No evidence uses these initial factors: Computing P(L|+r) , the initial factors become: We eliminate all vars other than query + evidence © Daniel S. Weld

36 Evidence II esult will be a selected joint of query and evidence
E.g. for P(L | +r), we’d end up with: To get our answer, just normalize this © Daniel S. Weld

37 General Variable Elimination
Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize © Daniel S. Weld

38 Variable Elimination Bayes Rule
© Daniel S. Weld

39 Example 2 © Daniel S. Weld

40 © Daniel S. Weld

41 Notes on VE Each operation is a simply multiplication of factors and summing out a variable Complexity determined by size of largest factor linear in number of vars, exponential in largest factorelimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) On tree-structured graphs, variable elimination runs in polynomial time, like tree-structured CSPs Practically, inference is much more tractable using structure of this sort © Daniel S. Weld


Download ppt "Probabilistic Reasoning"

Similar presentations


Ads by Google