1 CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty ● Reading. Chapter 13.

Slides:



Advertisements
Similar presentations
Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea
Advertisements

PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
Week 21 Basic Set Theory A set is a collection of elements. Use capital letters, A, B, C to denotes sets and small letters a 1, a 2, … to denote the elements.
The Engineering Design of Systems: Models and Methods
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6 [P]: Reasoning Under Uncertainty Section.
CPSC 422 Review Of Probability Theory.
Probability.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
CHAPTER 6 Statistical Analysis of Experimental Data
Uncertainty Chapter 13.
Joint Distribution of two or More Random Variables
Lecture II.  Using the example from Birenens Chapter 1: Assume we are interested in the game Texas lotto (similar to Florida lotto).  In this game,
Conditional & Joint Probability A brief digression back to joint probability: i.e. both events O and H occur Again, we can express joint probability in.
A Brief Summary for Exam 1 Subject Topics Propositional Logic (sections 1.1, 1.2) –Propositions Statement, Truth value, Proposition, Propositional symbol,
Introduction to Proofs
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Chapter 3 Functions Functions provide a means of expressing relationships between variables, which can be numbers or non-numerical objects.
CSE 311 Foundations of Computing I Lecture 8 Proofs and Set Theory Spring
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Barnett/Ziegler/Byleen Finite Mathematics 11e1 Chapter 7 Review Important Terms, Symbols, Concepts 7.1. Logic A proposition is a statement (not a question.
Week 11 What is Probability? Quantification of uncertainty. Mathematical model for things that occur randomly. Random – not haphazard, don’t know what.
Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.
CS201: Data Structures and Discrete Mathematics I
Sets and Sentences Open Sentences Foundations of Real Analysis.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
1 Logical Agents CS 171/271 (Chapter 7) Some text and images in these slides were drawn from Russel & Norvig’s published material.
Copyright © Cengage Learning. All rights reserved.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY
Chapter SETS DEFINITION OF SET METHODS FOR SPECIFYING SET SUBSETS VENN DIAGRAM SET IDENTITIES SET OPERATIONS.
2.1 Sets 2.2 Set Operations –Set Operations –Venn Diagrams –Set Identities –Union and Intersection of Indexed Collections 2.3 Functions 2.4 Sequences and.
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
1 CSC384: Intro to Artificial Intelligence Lecture 5.  Knowledge Representation.
lamsweerde Chap.4: Formal Requirements Specification © 2009 John Wiley and Sons Fundamentals of RE Chapter 4 Requirements.
Sets Definition: A set is an unordered collection of objects, called elements or members of the set. A set is said to contain its elements. We write a.
Probabilistic Robotics Introduction.  Robotics is the science of perceiving and manipulating the physical world through computer-controlled devices.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
CSE 311: Foundations of Computing Fall 2013 Lecture 8: Proofs and Set theory.
1 CSC 384 Lecture Slides (c) , C. Boutilier and P. Poupart CSC384: Intro to Artificial Intelligence  Jan 12 th  Announcements: Tutorial splits.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Module #1 - Logic 1 Based on Rosen, Discrete Mathematics & Its Applications. Prepared by (c) , Michael P. Frank and Modified By Mingwu Chen Probability.
Probabilistic Robotics Introduction. SA-1 2 Introduction  Robotics is the science of perceiving and manipulating the physical world through computer-controlled.
1 The Relational Data Model David J. Stucki. Relational Model Concepts 2 Fundamental concept: the relation  The Relational Model represents an entire.
Final Exam Information These slides and more detailed information will be posted on the webpage later…
Probability and statistics - overview Introduction Basics of probability theory Events, probability, different types of probability Random variable, probability.
Topic Overview and Study Checklist. From Chapter 7 in the white textbook: Modeling with Differential Equations basic models exponential logistic modified.
Chapter 2 Sets and Functions.
Reasoning under Uncertainty
Review of Probability.
Axiomatic Number Theory and Gödel’s Incompleteness Theorems
Chapter 10: Using Uncertain Knowledge
What is Probability? Quantification of uncertainty.
Reasoning Under Uncertainty in Expert System
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Uncertainty Chapter 13.
Module #16: Probability Theory
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
A Brief Summary for Exam 1
Sets A set is simply any collection of objects
Sets and Probabilistic Models
Sets and Probabilistic Models
Experiments, Outcomes, Events and Random Variables: A Revisit
Sets and Probabilistic Models
Sets and Probabilistic Models
Presentation transcript:

1 CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty ● Reading. Chapter 13.

2 Uncertainty ● With First Order Logic we examined a mechanism for representing true facts and for reasoning to new true facts. ● The emphasis on truth is sensible in some domains. ● But in many domain it is not sufficient to deal only with true facts. We have to “gamble”. ● E.g., we don’t know for certain what the traffic will be like on a trip to the airport.

3 Uncertainty ● But how do we gamble rationally? ■ If we must arrive at the airport at 9pm on a week night we could “safely” leave for the airport ½ hour before. ● Some probability of the trip taking longer, but the probability is low. ■ If we must arrive at the airport at 4:30pm on Friday we most likely need 1 hour or more to get to the airport. ● Relatively high probability of it taking 1.5 hours.

4 Uncertainty ● To act rationally under uncertainty we must be able to evaluate how likely certain things are. ■ With FOL a fact F is only useful if it is known to be true or false. ■ But we need to be able to evaluate how likely it is that F is true. ● By weighing likelihoods of events (probabilities) we can develop mechanisms for acting rationally under uncertainty.

5 Dental Diagnosis example. ● In FOL we might formulate ■  P. symptom(P,toothache) → disease(p,cavity)  disease(p,gumDisease)  disease(p,foodStuck)   ■ When do we stop? ● Cannot list all possible causes. ● We also want to rank the possibilities. We don’t want to start drilling for a cavity before checking for more likely causes first.

6 Probability (over Finite Sets) ● Probability is a function defined over a set of events U. Often called the universe of events. ● It assigns a value Pr(e) to each event e  U, in the range [0,1]. ● It assigns a value to every set of events F by summing the probabilities of the members of that set. Pr(F) =  e  F Pr(e) ● Pr(U) = 1, i.e., sum over all events is 1. ● Therefore: Pr({}) = 0 and Pr(A  B) = Pr(A) + Pr(B) – Pr(A  B)

7 Probability in General ● Given a set U (universe), a probability function is a function defined over the subsets of U that maps each subset to the real numbers and that satisfies the Axioms of Probability 1. Pr(U) = 1 2. Pr(A)  [0,1] 3. Pr(A  B) = Pr(A) + Pr(B) – Pr(A  B) Note if A  B = {} then Pr(A  B) = Pr(A) + Pr(B)

8 Probability over Feature Vectors ● We will work with a universe consisting of a set of vectors of feature values. ● Like CSPs, we have 1. a set of variables V 1, V 2, …, V n 2. a finite domain of values for each variable, Dom[V 1 ], Dom[V 2 ], …, Dom[V n ]. ● The universe of events U is the set of all vectors of values for the variables  d 1, d 2, …, d n  : d i  Dom[V i ]

9 Probability over Feature Vectors ● This event space has size  i |Dom[V i ]| i.e., the product of the domain sizes. ● E.g., if |Dom[V i ]| = 2 we have 2 n distinct atomic events. (Exponential!)

10 Probability over Feature Vectors ● Asserting that some subset of variables have particular values allows us to specify a useful collection of subsets of U. ● E.g. ■ {V 1 = a} = set of all events where V 1 = a ■ {V 1 = a, V 3 = d} = set of all events where V 1 = a and V 3 = d. ■ … ● E.g. ■ Pr({V 1 = a}) =  x  Dom[V 3 ]} Pr({V 1 = a, V 3 = x}).

11 Probability over Feature Vectors ● If we had Pr of every atomic event (full instantiation of the variables) we could compute the probability of any other set (including sets can cannot be specified some set of variable values). ● E.g. ■ {V 1 = a} = set of all events where V 1 = a ● Pf({V 1 = a}) =  x 2  Dom[V 2 ]  x 3  Dom[V 3 ]   x n  Dom[V n ] Pr(V 1 =a, V 2 =x 2,V 3 =x 3,…,V n =x n )

12 Probability over Feature Vectors ● Problem: ■ This is an exponential number of atomic probabilities to specify. ■ Requires summing up an exponential number of items. ● For evaluating the probability of sets containing a particular subset of variable assignments we can do much better. Improvements come from the use of probabilistic independence, especially conditional independence.

13 Conditional Probabilities. ● In logic one has implication to express “conditional” statements. ■  X. apple(X)  goodToEat(X) ■ This assertion only provides useful information about “apples”. ● With probabilities one has access to a different way of capturing conditional information: conditional probabilities. ● It turns out that conditional probabilities are essential for both representing and reasoning with probabilistic information.

14 Conditional Probabilities ● Say that A is a set of events such that Pr(A) > 0. ● Then one can define a conditional probability wrt the event A: Pr(B|A) = Pr(B  A)/Pr(A)

15 Conditional Probabilities B A B covers about 30% of the entire space, but covers over 80% of A.

16 Conditional Probabilities ● Conditioning on A, corresponds to restricting one’s attention to the events in A. ● We now consider A to be the whole set of events (a new universe of events): Pr(A|A) = 1. ● Then we assign all other sets a probability by taking the probability mass that “lives” in A (Pr(B  A)), and normalizing it to the range [0,1] by dividing by Pr(A).

17 Conditional Probabilities B A B’s probability in the new universe A is 0.8.

18 Conditional Probabilities ● A conditional probability is a probability function, but now over A instead of over the entire space. ■ Pr(A|A) = 1 ■ Pr(B|A)  [0,1] ■ Pr(C  B|A) = Pr(C|A) + Pr(B|A) – Pr(C  B|A)

19 Properties and Sets ● In First Order Logic, properties like tall(X), are interpreted as a set of individuals (the individuals that are tall). ● Similarly any set of events A can be interpreted as a property: the set of events with property A. ● When we specify big(X)  tall(X) in FOL, we interpret this as the set of individuals that lie in the intersection of the sets big(X) and tall(X).

20 Properties and Sets ● Similarly, big(X)  tall(X) is the set of individuals that is the union of big(X) and tall(X). ● Hence, we often write ■ A  Bto represent the set of events A  B ■ A  Bto represent the set of events A  B ■  Ato represent the set of events U-A (i.e., the complement of A wrt the universe of events U)

21 Independence ● It could be that the density of B on A is identical to its density on the entire set. ● Density: pick an element at random from the entire set. How likely is it that the picked element is in the set B? ● Alternately the density of B on A could be much different that its density on the whole space. ● In the first case we say that B is independent of A. While in the second case B is dependent on A.

22 Independence Density of B A A IndependentDependent

23 Independence Definition A and B are independent properties Pr(B|A) = Pr(B) A and B are dependent. Pr(B|A)  Pr(B)

24 Implications for Knowledge ● Say that we have picked an element from the entire set. Then we find out that this element has property A (i.e., is a member of the set A). ■ Does this tell us anything more about how likely it is that the element also has property B? ■ If B is independent of A then we have learned nothing new about the likelihood of the element being a member of B.

25 Independence ● E.g., we have a feature vector, we don’t know which one. We then find out that it contains the feature V 1 =a. ■ I.e., we know that the vector is a member of the set {V 1 = a}. ■ Does this tell us anything about whether or not V 2 =a, V 3 =c, …, etc? ■ This depends on whether or not these features are independent/dependent of V 1 =a.

26 Conditional Independence ● Say we have already learned that the randomly picked element has property A. ● We want to know whether or not the element has property B: Pr(B|A)expresses the probability of this being true. ● Now we learn that the element also has property C. Does this give us more information about B- ness? Pr(B|A  C)expresses the probability of this being true under the additional information.

27 Conditional Independence ● If Pr(B|A  C) = Pr(B|A) then we have not gained any additional information from knowing that the element is also a member of the set C. ● In this case we say that B is conditionally independent of C given A. ● That is, once we know A, additionally knowing C is irrelevant (wrt to whether or not B is true). ■ Conditional independence is independence in the conditional probability space Pr(●|A). ■ Note we could have Pr(B|C)  Pr(B). But once we learn A, C becomes irrelevant.

28 Computational Impact of Independence ● We will see in more detail how independence allows us to speed up computation. But the fundamental insight is that If A and B are independent properties then Pr(A  B) =Pr(B) * Pr(A) Proof: Pr(B|A) = Pr(B)independence Pr(A  B)/Pr(A) = Pr(B) definition Pr(A  B) = Pr(B) * Pr(A)

29 Computational Impact of Independence ● This property allows us to “break” up the computation of a conjunction “Pr(A  B)” into two separate computations “Pr(A)” and “Pr(B)”. ● Dependent on how we express our probabilistic knowledge this yield great computational savings.

30 Computational Impact of Independence ● Similarly for conditional independence. Pr(B|C  A) = Pr(B|A)  Pr(B  C|A) = Pr(B|A) * Pr(C|A) Proof: Pr(B|C  A) = Pr(B|A) independence Pr(B  C  A)/Pr(C  A) = Pr(B  A)/Pr(A)defn. Pr(B  C  A)/Pr(A) = Pr(C  A)/Pr(A) * Pr(B  A)/Pr(A) Pr(B  C|A) = Pr(B|A) * Pr(C|A)defn.

31 Computational Impact of Independence ● Conditional independence allows us to break up our computation onto distinct parts Pr(B  C|A) = Pr(B|A) * Pr(C|A) ● And it also allows us to ignore certain pieces of information Pr(B|A  C) = Pr(B|A)

32 Bayes Rule ● Bayes rule is a simple mathematical fact. But it has great implications wrt how probabilities can be reasoned with. ■ Pr(Y|X) = Pr(X|Y)Pr(Y)/Pr(X) Pr(Y|X) = Pr(Y⋀X)/Pr(X) = Pr(Y⋀X)/Pr(X) * P(Y)/P(Y) = Pr(Y⋀X)/Pr(Y) * Pr(Y)/Pr(X) = Pr(X|Y)Pr(Y)/Pr(X)

33 Bayes Rule ● Bayes rule allows us to use a supplied conditional probability in both directions. ● E.g., from treating patients with heart disease we might be able to estimate the value of Pr( high_Cholesterol | heart_disease) ● With Bayes rule we can turn this around into a predictor for heart disease Pr(heart_disease | high_Cholesterol) ● Now with a simple blood test we can determine “high_Cholesterol” use this to help estimate the likelihood of heart disease.

34 Bayes Rule ● For this to work we have to deal with the other factors as well Pr(heart_disease | high_Cholesterol) = Pr(high_Cholesterol | heart_disease) * Pr(heart_disease)/Pr(high_Cholesterol) ● We will return to this later.