Presentation is loading. Please wait.

Presentation is loading. Please wait.

Representing Uncertainty

Similar presentations


Presentation on theme: "Representing Uncertainty"— Presentation transcript:

1 Representing Uncertainty

2 Aspects of Uncertainty
Suppose you have a flight at 12 noon When should you leave for SEATAC What are traffic conditions? How crowded is security? Leaving 18 hours early may get you there But … ? © Daniel S. Weld

3 Decision Theory = Probability + Utility Theory
Min before noon P(arrive-in-time) 20 min 30 min 45 min 60 min 120 min 1080 min Depends on your preferences Utility theory: representing & reasoning about preferences © Daniel S. Weld

4 What Is Probability? Probability: Calculus for dealing with nondeterminism and uncertainty Cf. Logic Probabilistic model: Says how often we expect different things to occur Cf. Function © Daniel S. Weld

5 What Is Statistics? Statistics 1: Describing data
Statistics 2: Inferring probabilistic models from data Structure Parameters © Daniel S. Weld

6 Why Should You Care? The world is full of uncertainty
Logic is not enough Computers need to be able to handle uncertainty Probability: new foundation for AI (& CS!) Massive amounts of data around today Statistics and CS are both about data Statistics lets us summarize and understand it Statistics is the basis for most learning Statistics lets data do our work for us © Daniel S. Weld

7 Outline Basic notions Bayesian networks Statistical learning
Atomic events, probabilities, joint distribution Inference by enumeration Independence & conditional independence Bayes’ rule Bayesian networks Statistical learning Dynamic Bayesian networks (DBNs) Markov decision processes (MDPs) © Daniel S. Weld

8 Logic vs. Probability Symbol: Q, R … Random variable: Q …
Domain: you specify e.g. {heads, tails} [1, 6] Boolean values: T, F State of the world: Assignment to Q, R … Z Atomic event: complete specification of world: Q… Z Mutually exclusive Exhaustive Prior probability (aka Unconditional prob: P(Q) Joint distribution: Prob. of every atomic event © Daniel S. Weld

9 Syntax for Propositions
© Daniel S. Weld

10 Propositions Assume Boolean variables Propositions: A = true B = false
© Daniel S. Weld

11 Why Use Probability? E.g. P(AB) = ? P(A) + P(B) - P(AB) A  B A True
© Daniel S. Weld

12 Axioms of Probability Theory
All probabilities between 0 and 1 0 ≤ P(A) ≤ 1 P(true) = 1 P(false) = 0. The probability of disjunction is: A B A  B True © Daniel S. Weld

13 Prior Probability Any question can be answered by the
joint distribution © Daniel S. Weld

14 Conditional Probability
© Daniel S. Weld

15 Conditional Probability
Def: © Daniel S. Weld

16 Inference by Enumeration
P(toothache)= = .20 or 20% This process is called “Marginalization” © Daniel S. Weld

17 Inference by Enumeration
P(toothachecavity = .20 + ?? .28 © Daniel S. Weld

18 Inference by Enumeration
© Daniel S. Weld

19 Problems ?? Worst case time: O(dn) Space complexity also O(dn)
Where d = max arity And n = number of random variables Space complexity also O(dn) Size of joint distribution How get O(dn) entries for table?? Value of cavity & catch irrelevant - When computing P(toothache) © Daniel S. Weld

20 Independence A and B are independent iff:
These two constraints are logically equivalent Therefore, if A and B are independent: © Daniel S. Weld

21 Independence A A  B B True © Daniel S. Weld

22 Independence Complete independence is powerful but rare
What to do if it doesn’t hold? © Daniel S. Weld

23 Conditional Independence
A&B not independent, since P(A|B) < P(A) A A  B B True © Daniel S. Weld

24 Conditional Independence
But: A&B are made independent by C A A  B AC P(A|C) = P(A|B,C) B True C B  C © Daniel S. Weld

25 Conditional Independence
Instead of 7 entries, only need 5 © Daniel S. Weld

26 Conditional Independence II
P(catch | toothache, cavity) = P(catch | cavity) P(catch | toothache,cavity) = P(catch |cavity) Why only 5 entries in table? © Daniel S. Weld

27 Power of Cond. Independence
Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!! Conditional independence is the most basic & robust form of knowledge about uncertain environments. © Daniel S. Weld

28 Bayes Rule Simple proof from def of conditional probability: QED:
(Def. cond. prob.) (Def. cond. prob.) (Mult by P(H) in line 1) QED: (Substitute #3 in #2) © Daniel S. Weld

29 Use to Compute Diagnostic Probability from Causal Probability
E.g. let M be meningitis, S be stiff neck P(M) = , P(S) = 0.1, P(S|M)= 0.8 P(M|S) = © Daniel S. Weld

30 Bayes’ Rule & Cond. Independence
© Daniel S. Weld

31 Bayes Nets In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential space for representation & inference BNs provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update © Daniel S. Weld

32 An Example Bayes Net Earthquake Burglary Radio Alarm Nbr1Calls
Pr(B=t) Pr(B=f) Earthquake Burglary Pr(A|E,B) e,b (0.1) e,b (0.8) e,b (0.15) e,b (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld

33 Earthquake Example (con’t)
Burglary Alarm Nbr2Calls Nbr1Calls Radio If I know if Alarm, no other evidence influences my degree of belief in Nbr1Calls P(N1|N2,A,E,B) = P(N1|A) also: P(N2|N2,A,E,B) = P(N2|A) and P(E|B) = P(E) By the chain rule we have P(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)· P(A|E,B) ·P(E|B) ·P(B) = P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B) Full joint requires only 10 parameters (cf. 32) © Daniel S. Weld

34 BNs: Qualitative Structure
Graphical structure of BN reflects conditional independence among variables Each variable X is a node in the DAG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence “Markov Blanket” © Daniel S. Weld

35 Given Parents, X is Independent of Non-Descendants
© Daniel S. Weld

36 For Example Earthquake Burglary Radio Alarm Nbr1Calls Nbr2Calls
© Daniel S. Weld

37 Given Markov Blanket, X is Independent of All Other Nodes
MB(X) = Par(X)  Childs(X)  Par(Childs(X)) © Daniel S. Weld

38 Conditional Probability Tables
Pr(B=t) Pr(B=f) Earthquake Burglary Pr(A|E,B) e,b (0.1) e,b (0.8) e,b (0.15) e,b (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld

39 Conditional Probability Tables
For complete spec. of joint dist., quantify BN For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)| If X1, X2,... Xn is any topological sort of the network, then we are assured: P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1) … P(X2 | X1) · P(X1) = P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1) © Daniel S. Weld

40 Inference in BNs The graphical independence representation
yields efficient inference schemes We generally want to compute Pr(X), or Pr(X|E) where E is (conjunctive) evidence Computations organized by network topology One simple algorithm: variable elimination (VE) © Daniel S. Weld

41 P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a)
P(B | J=true, M=true) Earthquake Burglary Radio Alarm John Mary P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a) e a © Daniel S. Weld

42 Structure of Computation
Dynamic Programming © Daniel S. Weld

43 Variable Elimination A factor is a function from some set of variables into a specific value: e.g., f(E,A,N1) CPTs are factors, e.g., P(A|E,B) function of A,E,B VE works by eliminating all variables in turn until there is a factor with only query variable To eliminate a variable: join all factors containing that variable (like DB) sum out the influence of the variable on new factor exploits product form of joint distribution © Daniel S. Weld

44 Example of VE: P(N1) P(N1) = SN2,A,B,E P(N1,N2,A,B,E)
= SN2,A,B,E P(N1|A)P(N2|A) P(B)P(A|B,E)P(E) = SAP(N1|A) SN2P(N2|A) SBP(B) SEP(A|B,E)P(E) = SAP(N1|A) SN2P(N2|A) SBP(B) f1(A,B) = SAP(N1|A) SN2P(N2|A) f2(A) = SAP(N1|A) f3(A) = f4(N1) Earthqk Burgl Alarm N1 N2 © Daniel S. Weld

45 Notes on VE Each operation is a simply multiplication of factors and summing out a variable Complexity determined by size of largest factor e.g., in example, 3 vars (not 5) linear in number of vars, exponential in largest factorelimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) Practically, inference is much more tractable using structure of this sort © Daniel S. Weld


Download ppt "Representing Uncertainty"

Similar presentations


Ads by Google