Computer Science CPSC 322 Lecture 26 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)

Where are we? Environment Problem Type Query Planning Deterministic Stochastic Constraint Satisfaction Search Arc Consistency Search Logics STRIPS Vars + Constraints Value Iteration Variable Elimination Belief Nets Decision Nets Markov Processes Static Sequential Representation Reasoning Technique Variable Elimination Done with Deterministic Environments 2

Where are we? Environment Problem Type Query Planning Deterministic Stochastic Constraint Satisfaction Search Arc Consistency Search Logics STRIPS Vars + Constraints Value Iteration Variable Elimination Belief Nets Decision Nets Markov Processes Static Sequential Representation Reasoning Technique Variable Elimination Second Part of the Course 3

Where Are We? Environment Problem Type Query Planning Deterministic Stochastic Constraint Satisfaction Search Arc Consistency Search Logics STRIPS Vars + Constraints Value Iteration Variable Elimination Belief Nets Decision Nets Markov Processes Static Sequential Representation Reasoning Technique Variable Elimination We’ll focus on Belief Nets 4

Lecture Overview Intro to Reasoning Under Uncertainty Introduction to Probability Random Variables and Possible World Semantics Probability Distributions Marginalization Slide 5

Two main sources of uncertainty (From Lecture 2) Sensing Uncertainty: The agent cannot fully observe a state of interest. For example: Right now, how many people are in this building? What disease does this patient have? Where is the soccer player behind me? Effect Uncertainty: The agent cannot be certain about the effects of its actions. For example: If I work hard, will I get an A? Will this drug work for this patient? Where will the ball go when I kick it? 6

Motivation for uncertainty To act in the real world, we almost always have to handle uncertainty (both effect and sensing uncertainty) Deterministic domains are an abstraction Sometimes this abstraction enables more powerful inference Now we don’t make this abstraction anymore AI main focus shifted from logic to probability in the 1980s The language of probability is very expressive and general New representations enable efficient reasoning We will see some of these, in particular Bayesian networks Reasoning under uncertainty is part of the ‘new’ AI This is not a dichotomy: framework for probability is logical! New frontier: combine logic and probability 7

Interesting article about AI and uncertainty “The machine age”, by Peter Norvig (head of research at Google) New York Post, 12 February 2011 http://www.nypost.com/f/print/news/opinion/opedcolumnists/the_machine _age_tM7xPAv4pI4JslK0M1JtxI http://www.nypost.com/f/print/news/opinion/opedcolumnists/the_machine _age_tM7xPAv4pI4JslK0M1JtxI “The things we thought were hard turned out to be easier.” Playing grandmaster level chess, or proving theorems in integral calculus “Tasks that we at first thought were easy turned out to be hard.” A toddler (or a dog) can distinguish hundreds of objects (ball, bottle, blanket, mother, …) just by glancing at them Very difficult for computer vision to perform at this level “Dealing with uncertainty turned out to be more important than thinking with logical precision.” Reasoning under uncertainty (and lots of data) are key to progress 8

Probability as a measure of uncertainty/ignorance Probability measures an agent's degree of belief in truth of propositions about states of the world It does not measure how true a proposition is Propositions are true or false. We simply may not know exactly which. Example: I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’? 9

Probability as a measure of uncertainty/ignorance Probability measures an agent's degree of belief in truth of propositions about states of the world It does not measure how true a proposition is Propositions are true or false. We simply may not know exactly which. Example: I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’? It is 1/6 ≈ 16.7%. I now look at the dice. What is ‘the’ (my) probability now? My probability is now Your probability (you have not looked at the dice) What if I tell some of you the result is even? Their probability becomes Slide 11

Probability as a measure of uncertainty/ignorance Probability measures an agent's degree of belief in truth of propositions about states of the world It does not measure how true a proposition is Propositions are true or false. We simply may not know exactly which. Example: I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’? It is 1/6 ≈ 16.7%. I now look at the dice. What is ‘the’ (my) probability now? My probability is now either 1 or 0, depending on what I observed. Your probability has not changed: 1/6 ≈ 16.7% What if I tell some of you the result is even? Their probability increases to 1/3 ≈ 33.3%, if they believe me Different agents can have different degrees of belief in (probabilities for) a proposition, based on the evidence they have. 12

Probability as a measure of uncertainty/ignorance Probability measures an agent's degree of belief in truth of propositions about states of the world Belief in a proposition f can be measured in terms of a number between 0 and 1 this is the probability of f E.g: P(“roll of fair die came out as a 6”) = 1/6 ≈ 16.7% = 0.167 Using probabilities between 0 and 1 is purely a convention. P(f) = 0 means that f is believed to be. B. Probably false A. Probably true C. Definitely falseD. Definitely true 13

Probability as a measure of uncertainty/ignorance Probability measures an agent's degree of belief in truth of propositions about states of the world Belief in a proposition f can be measured in terms of a number between 0 and 1 this is the probability of f E.g. P(“roll of fair dice came out as a 6”) = 1/6 ≈ 16.7% = 0.167 Using probabilities between 0 and 1 is purely a convention. P(f) = 0 means that f is believed to be Definitely false: the probability of f being true is zero. Likewise, P(f) = 1 means f is believed to be definitely true. 14

Probability Theory and Random Variables Probability Theory system of logical axioms and formal operations for sound reasoning under uncertainty Basic element: random variable X X is a variable like the ones we have seen in CSP/Planning/Logic but the agent can be uncertain about the value of X As usual, the domain of a random variable X, written dom(X), is the set of values X can take Types of variables Boolean: e.g., Cancer (does the patient have cancer or not?) Categorical: e.g., CancerType could be one of {breastCancer, lungCancer, skinMelanomas} Numeric: e.g., Temperature (integer or real) We will focus on Boolean and categorical variables 16

Random Variables (cont’) A tuple of random variables is a complex random variable with domain.. Dom(X 1 ) × Dom(X 2 )… × Dom(X n )… Cavity = T; Weather = sunny 17

Possible Worlds E.g., if we model only two Boolean variables Cavity and Toothache, then there are 4 distinct possible worlds: w1: Cavity = T  Toothache = T w2: Cavity = T  Toothache = F w3; Cavity = F  Toothache = T w4: Cavity = T  Toothache = T A possible world specifies an assignment to each random variable possible worlds are mutually exclusive and exhaustive CavityToothache TT TF FT FF w╞ f means that proposition f is true in world w A probability measure  (w) over possible worlds w is a nonnegative real number such that -  (w) sums to 1 over all possible worlds w Why does this make sense? Slide 18

Possible Worlds E.g., if we model only two Boolean variables Cavity and Toothache, then there are 4 distinct possible worlds: w1: Cavity = T  Toothache = T w2: Cavity = T  Toothache = F w3; Cavity = F  Toothache = T w4: Cavity = T  Toothache = T A possible world specifies an assignment to each random variable possible worlds are mutually exclusive and exhaustive CavityToothache TT TF FT FF w╞ f means that proposition f is true in world w A probability measure  (w) over possible worlds w is a nonnegative real number such that -  (w) sums to 1 over all possible worlds w - The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w ). i.e. sum of the probabilities of the worlds w in which f is true Why does this make sense?Because for sure we are in one of these worlds! 19

Possible Worlds Semantics Example: weather in Vancouver one Boolean variable: Weather with domain {sunny, cloudy} Possible worlds: w 1 : Weather = sunny w 2 : Weather = cloudy If p(Weather = sunny) = 0.4 What is the probability of p(Weather = cloudy)? w╞ f means that proposition f is true in world w A probability measure  (w) over possible worlds w is a nonnegative real number such that -  (w) sums to 1 over all possible worlds w - The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w). 20

Possible Worlds Semantics Example: weather in Vancouver one Boolean variable: Weather with domain {sunny, cloudy} Possible worlds: w 1 : Weather = sunny w 2 : Weather = cloudy If p(Weather = sunny) = 0.4 What is the probability of p(Weather = cloudy)? p(Weather = sunny) = 0.4 means that  (w 1 ) is 0.4  (w 1 ) and  (w 2 ) have to sum to 1 (those are the only 2 possible worlds) So  (w 2 ) has to be 0.6, and thus p(Weather = cloudy) = 0.6 w╞ f means that proposition f is true in world w A probability measure  (w) over possible worlds w is a nonnegative real number such that -  (w) sums to 1 over all possible worlds w - The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w). 21

One more example Now we have an additional variable: Temperature, with domain {hot, mild, cold} There are now 6 possible worlds: What’s the probability of it being cloudy and cold? B. 0.2 A. 0.1C. 0.3E. Not enough info WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold? D. 1 22

One more example Now we have an additional variable: Temperature, with domain {hot, mild, cold} There are now 6 possible worlds: What’s the probability of it being cloudy and cold? 0.10 + 0.20 + 0.10 + 0.05 + 0.35 = 0.8 It is 0.2: the probability has to sum to 1 over all possible worlds WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold? 23

One more example Now we have an additional variable: Temperature, with domain {hot, mild, cold} There are now 6 possible worlds: What’s the probability of it being cloudy or cold? B. 0.6A. 1 C. 0.3 D. 0.7 WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 Remember - The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w) - sum of the probabilities of the worlds w in which f is true 24

One more example Now we have an additional variable: Temperature, with domain {hot, mild, cold} There are now 6 possible worlds: What’s the probability of it being cloudy or cold? µ(w3) + µ(w4) + µ(w5) + µ(w6) = 0.7 WeatherTemperatureµ(w) w1sunnyhot0.10 w2sunnymild0.20 w3sunnycold0.10 w4cloudyhot0.05 w5cloudymild0.35 w6cloudycold0.20 Remember - The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w) - sum of the probabilities of the worlds w in which f is true 25

Probability Distributions Consider the case where possible worlds are simply assignments to one random variable. When dom(X) is infinite we need a probability density function We will focus on the finite case Definition (probability distribution) A probability distribution P on a random variable X is a function dom(X)  [0,1] such that x  P(X=x) 26

Joint Distribution Joint distribution over random variables X 1, …, X n : a probability distribution over the joint random variable with domain dom(X 1 ) × … × dom(X n ) (the Cartesian product) Think of a joint distribution over n variables as the n-dimensional table of the corresponding possible worlds Each row corresponds to an assignment X 1 = x 1, …, X n = x n and its probability P(X 1 = x 1, …,X n = x n ) We can also write P(X 1 = x 1  …  X n = x n ) The sum of probabilities across the whole table is 1. E.g., {Weather, Temperature} example from before WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 27

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). Simply an application of the definition of probability measure! P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z Remember? - The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w) - sum of the probabilities of the worlds w in which f is true 29

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). This corresponds to summing out a dimension in the table.. Temperatureµ(w) hot? mild? cold? WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z B. True A. False Marginalization over Weather Probabilities in new table still sum to 1 C. It Depends 30

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). This corresponds to summing out a dimension in the table.. Temperatureµ(w) hot? mild? cold? WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z B.True since it’s a probability distribution! Marginalization over Weather Probabilities in new table still sum to 1 31

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). This corresponds to summing out a dimension in the table. Temperatureµ(w) hot?? mild cold WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z How do we compute P(T =hot)? 32

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). This corresponds to summing out a dimension in the table. Temperatureµ(w) hot?? mild cold WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(Temperature=hot) = P(Weather=sunny, Temperature = hot) + P(Weather=cloudy, Temperature = hot) = P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z 33

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). This corresponds to summing out a dimension in the table. Temperatureµ(w) hot0.15 mild cold WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(Temperature=hot) = P(Weather=sunny, Temperature = hot) + P(Weather=cloudy, Temperature = hot) = 0.10 + 0.05 = 0.15 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z 34

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). This corresponds to summing out a dimension in the table. Temperatureµ(w) hot0.15 mild?? cold WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 B. 0.35 A. 0.20 C. 0.85 D. 0.55 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z 35

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). This corresponds to summing out a dimension in the table. Temperatureµ(w) hot0.15 mild?? cold WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z 36

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). This corresponds to summing out a dimension in the table. Temperatureµ(w) hot0.15 mild0.55 cold??? WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z 37

Marginalization Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). This corresponds to summing out a dimension in the table. The new table still sums to 1. It must, since it’s a probability distribution! Temperatureµ(w) hot0.15 mild0.55 cold0.30 WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 Alternative way to compute last entry: probabilities have to sum to 1. P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z 38

Given the joint distribution, we can compute distributions over smaller sets of variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). You can marginalize over any of the variables Marginalization Weatherµ(w) sunny?? cloudy WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z e.g., Marginalization over Temperature B. 0.30 A. 0.15 C. 0.40 D. 0.60 39

Given the joint distribution, we can compute distributions over smaller sets of variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). You can marginalize over any of the variables Marginalization Weatherµ(w) sunny0.40 cloudy WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(Weather=sunny) = P(Weather=sunny, Temperature = hot) + P(Weather=sunny, Temperature = mild) + P(Weather=sunny, Temperature = cold) = 0.10 + 0.20 + 0.10 = 0.40 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z e.g., Marginalization over Temperature 40

Marginalization Given the joint distribution, we can compute distributions over smaller sets of variables through marginalization: We also write this as P(X) =  z  dom(Z) P(X, Z = z). You can marginalize over any of the variables Weatherµ(w) sunny0.40 cloudy0.60 WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 P(X=x) =  z  dom(Z) P(X=x, Z = z) Marginalization over Z e.g., Marginalization over Temperature 41

Marginalization We also marginalize over more than one variable at once E.g. go from P(Wind, Weather, Temperature) to P (Weather) Weatherµ(w) sunny cloudy WindWeatherTemperatureµ(w) yessunnyhot0.04 yessunnymild0.09 yessunnycold0.07 yescloudyhot0.01 yescloudymild0.10 yescloudycold0.12 nosunnyhot0.06 nosunnymild0.11 nosunnycold0.03 nocloudyhot0.04 nocloudymild0.25 nocloudycold0.08 i.e., Marginalization over Temperature and Wind P(X=x) =  z 1  dom(Z 1 ),…, z n  dom(Z n ) P(X=x, Z 1 = z 1, …, Z n = z n ) 42

Marginalization We also marginalize over more than one variable at once E.g. go from P(Wind, Weather, Temperature) to P (Weather) Weatherµ(w) sunny??? cloudy WindWeatherTemperatureµ(w) yessunnyhot0.04 yessunnymild0.09 yessunnycold0.07 yescloudyhot0.01 yescloudymild0.10 yescloudycold0.12 nosunnyhot0.06 nosunnymild0.11 nosunnycold0.03 nocloudyhot0.04 nocloudymild0.25 nocloudycold0.08 i.e., Marginalization over Temperature and Wind P(X=x) =  z 1  dom(Z 1 ),…, z n  dom(Z n ) P(X=x, Z 1 = z 1, …, Z n = z n ) B. 0.20 A. 0.15 C. 0.33 D. 0.40 43

Marginalization We can also marginalize over more than one variable at once Weatherµ(w) sunny0.40 cloudy WindWeatherTemperatureµ(w) yessunnyhot0.04 yessunnymild0.09 yessunnycold0.07 yescloudyhot0.01 yescloudymild0.10 yescloudycold0.12 nosunnyhot0.06 nosunnymild0.11 nosunnycold0.03 nocloudyhot0.04 nocloudymild0.25 nocloudycold0.08 P(X=x) =  z 1  dom(Z 1 ),…, z n  dom(Z n ) P(X=x, Z 1 = z 1, …, Z n = z n ) i.e., Marginalization over Temperature and Wind 44

Marginalization We can also get marginals for more than one variable WindWeatherTemperatureµ(w) yessunnyhot0.04 yessunnymild0.09 yessunnycold0.07 yescloudyhot0.01 yescloudymild0.10 yescloudycold0.12 nosunnyhot0.06 nosunnymild0.11 nosunnycold0.03 nocloudyhot0.04 nocloudymild0.25 nocloudycold0.08 WeatherTemperatureµ(w) sunnyhot0.10 sunnymild sunnycold cloudyhot cloudymild cloudycold P(X=x,Y=y) =  z 1  dom(Z 1 ),…, z n  dom(Z n ) P(X=x, Y=y, Z 1 = z 1, …, Z n = z n ) The probability of proposition f is P(f )=Σ w╞ f µ(w): sum of the probabilities of the worlds w in which f is true Still simply an application of the definition of probability measure 45

Define and give examples of random variables, their domains and probability distributions Calculate the probability of a proposition f given µ(w) for the set of possible worlds Define a joint probability distribution (JPD) Marginalize over specific variables to compute distributions over any subset of the variables Next Time: Conditional Probabilities Learning Goals For Probability so far Slide 46

Computer Science CPSC 322 Lecture 26 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)

Similar presentations

Presentation on theme: "Computer Science CPSC 322 Lecture 26 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Science CPSC 322 Lecture 26 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)

Similar presentations

Presentation on theme: "Computer Science CPSC 322 Lecture 26 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)"— Presentation transcript:

Similar presentations

About project

Feedback