Representing Uncertainty

Representing Uncertainty

Aspects of Uncertainty
Suppose you have a flight at 12 noon When should you leave for SEATAC What are traffic conditions? How crowded is security? Leaving 18 hours early may get you there But … ? © Daniel S. Weld

Decision Theory = Probability + Utility Theory
Min before noon P(arrive-in-time) 20 min 30 min 45 min 60 min 120 min 1080 min Depends on your preferences Utility theory: representing & reasoning about preferences © Daniel S. Weld

What Is Probability? Probability: Calculus for dealing with nondeterminism and uncertainty Cf. Logic Probabilistic model: Says how often we expect different things to occur Cf. Function © Daniel S. Weld

What Is Statistics? Statistics 1: Describing data
Statistics 2: Inferring probabilistic models from data Structure Parameters © Daniel S. Weld

Why Should You Care? The world is full of uncertainty
Logic is not enough Computers need to be able to handle uncertainty Probability: new foundation for AI (& CS!) Massive amounts of data around today Statistics and CS are both about data Statistics lets us summarize and understand it Statistics is the basis for most learning Statistics lets data do our work for us © Daniel S. Weld

Outline Basic notions Bayesian networks Statistical learning
Atomic events, probabilities, joint distribution Inference by enumeration Independence & conditional independence Bayes’ rule Bayesian networks Statistical learning Dynamic Bayesian networks (DBNs) Markov decision processes (MDPs) © Daniel S. Weld

Logic vs. Probability Symbol: Q, R … Random variable: Q …
Domain: you specify e.g. {heads, tails} [1, 6] Boolean values: T, F State of the world: Assignment to Q, R … Z Atomic event: complete specification of world: Q… Z Mutually exclusive Exhaustive Prior probability (aka Unconditional prob: P(Q) Joint distribution: Prob. of every atomic event © Daniel S. Weld

Syntax for Propositions
© Daniel S. Weld

Propositions Assume Boolean variables Propositions: A = true B = false
© Daniel S. Weld

Why Use Probability? E.g. P(AB) = ? P(A) + P(B) - P(AB) A  B A True
© Daniel S. Weld

Axioms of Probability Theory
All probabilities between 0 and 1 0 ≤ P(A) ≤ 1 P(true) = 1 P(false) = 0. The probability of disjunction is: A B A  B True © Daniel S. Weld

Prior Probability Any question can be answered by the
joint distribution © Daniel S. Weld

Conditional Probability
© Daniel S. Weld

Conditional Probability
Def: © Daniel S. Weld

Inference by Enumeration
P(toothache)= = .20 or 20% This process is called “Marginalization” © Daniel S. Weld

P(toothachecavity = .20 + ?? .28 © Daniel S. Weld

Problems ?? Worst case time: O(dn) Space complexity also O(dn)
Where d = max arity And n = number of random variables Space complexity also O(dn) Size of joint distribution How get O(dn) entries for table?? Value of cavity & catch irrelevant - When computing P(toothache) © Daniel S. Weld

Independence A and B are independent iff:
These two constraints are logically equivalent Therefore, if A and B are independent: © Daniel S. Weld

Independence A A  B B True © Daniel S. Weld

Independence Complete independence is powerful but rare
What to do if it doesn’t hold? © Daniel S. Weld

Conditional Independence
A&B not independent, since P(A|B) < P(A) A A  B B True © Daniel S. Weld

But: A&B are made independent by C A A  B AC P(A|C) = P(A|B,C) B True C B  C © Daniel S. Weld

Instead of 7 entries, only need 5 © Daniel S. Weld

Conditional Independence II
P(catch | toothache, cavity) = P(catch | cavity) P(catch | toothache,cavity) = P(catch |cavity) Why only 5 entries in table? © Daniel S. Weld

Power of Cond. Independence
Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!! Conditional independence is the most basic & robust form of knowledge about uncertain environments. © Daniel S. Weld

Bayes Rule Simple proof from def of conditional probability: QED:
(Def. cond. prob.) (Def. cond. prob.) (Mult by P(H) in line 1) QED: (Substitute #3 in #2) © Daniel S. Weld

Use to Compute Diagnostic Probability from Causal Probability
E.g. let M be meningitis, S be stiff neck P(M) = , P(S) = 0.1, P(S|M)= 0.8 P(M|S) = © Daniel S. Weld

Bayes’ Rule & Cond. Independence
© Daniel S. Weld

Bayes Nets In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential space for representation & inference BNs provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update © Daniel S. Weld

An Example Bayes Net Earthquake Burglary Radio Alarm Nbr1Calls
Pr(B=t) Pr(B=f) Earthquake Burglary Pr(A|E,B) e,b (0.1) e,b (0.8) e,b (0.15) e,b (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld

BNs: Qualitative Structure
Graphical structure of BN reflects conditional independence among variables Each variable X is a node in the DAG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence “Markov Blanket” © Daniel S. Weld

Given Markov Blanket, X is Independent of All Other Nodes
MB(X) = Par(X)  Childs(X)  Par(Childs(X)) © Daniel S. Weld

Conditional Probability Tables
Pr(B=t) Pr(B=f) Earthquake Burglary Pr(A|E,B) e,b (0.1) e,b (0.8) e,b (0.15) e,b (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld

Conditional Probability Tables
For complete spec. of joint dist., quantify BN For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)| If X1, X2,... Xn is any topological sort of the network, then we are assured: P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1) … P(X2 | X1) · P(X1) = P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1) © Daniel S. Weld

Inference in BNs The graphical independence representation
yields efficient inference schemes We generally want to compute Pr(X), or Pr(X|E) where E is (conjunctive) evidence Computations organized by network topology One simple algorithm: variable elimination (VE) © Daniel S. Weld

Variable Elimination A factor is a function from some set of variables into a specific value: e.g., f(E,A,N1) CPTs are factors, e.g., P(A|E,B) function of A,E,B VE works by eliminating all variables in turn until there is a factor with only query variable To eliminate a variable: join all factors containing that variable (like DB) sum out the influence of the variable on new factor exploits product form of joint distribution © Daniel S. Weld

Notes on VE Each operation is a simply multiplication of factors and summing out a variable Complexity determined by size of largest factor e.g., in example, 3 vars (not 5) linear in number of vars, exponential in largest factorelimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) Practically, inference is much more tractable using structure of this sort © Daniel S. Weld

Representing Uncertainty

Similar presentations

Presentation on theme: "Representing Uncertainty"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Representing Uncertainty

Similar presentations

Presentation on theme: "Representing Uncertainty"— Presentation transcript:

Similar presentations

About project

Feedback