Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14.

Similar presentations


Presentation on theme: "CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14."— Presentation transcript:

1 CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14

2 Late start on Thursday CS Annual Feedback Forum Thursday, March 23, 4:30 – 5:25 MEC 205 Pizza AI class will start at 5:30 on Thursday CS Annual Feedback Forum Thursday, March 23, 4:30 – 5:25 MEC 205 Pizza AI class will start at 5:30 on Thursday

3 Conditional probability The probability of a given all we know is b P (a | b)P (a | b) Written as an unconditional probability The probability of a given all we know is b P (a | b)P (a | b) Written as an unconditional probability

4 A distribution over Y can be obtained by summing out all the other variables from any joint distribution containing Y We need the full joint distribution to sum this up A distribution over Y can be obtained by summing out all the other variables from any joint distribution containing Y We need the full joint distribution to sum this up Conditioning eventevidence all other variables anywhere x and e are true

5 Bayes Network Bayes Network captures the full joint distribution For comparison: Bayes Network captures the full joint distribution For comparison:

6 Example P(B | john calls, mary calls)

7 Example P(B | john calls, mary calls) old way

8 Example Depth- first tree traversal required required

9 Example O(2 n ) time complexity Wasted repeated computations

10 Example Complexity of Bayes Net Bayes Net reduces space complexity of full-joint distributionBayes Net reduces space complexity of full-joint distribution Bayes Net does not reduce time complexity for general caseBayes Net does not reduce time complexity for general case Complexity of Bayes Net Bayes Net reduces space complexity of full-joint distributionBayes Net reduces space complexity of full-joint distribution Bayes Net does not reduce time complexity for general caseBayes Net does not reduce time complexity for general case

11 Time complexity Note repeated subexpressions Dynamic Programming Note repeated subexpressions Dynamic Programming

12 Fibonacci sequence example function fib(n) if n = 0 or n = 1 return n else return fib(n − 1) + fib(n − 2) 1. 1.fib(5) 2. 2.fib(4) + fib(3) 3. 3.(fib(3) + fib(2)) + (fib(2) + fib(1)) 4. 4.((fib(2) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0)) + fib(1)) 5. 5.(((fib(1) + fib(0)) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0)) + fib(1)) Fibonacci sequence example function fib(n) if n = 0 or n = 1 return n else return fib(n − 1) + fib(n − 2) 1. 1.fib(5) 2. 2.fib(4) + fib(3) 3. 3.(fib(3) + fib(2)) + (fib(2) + fib(1)) 4. 4.((fib(2) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0)) + fib(1)) 5. 5.(((fib(1) + fib(0)) + fib(1)) + (fib(1) + fib(0))) + ((fib(1) + fib(0)) + fib(1))

13 Dynamic Programming Memoization var m := map(int, int) function fib(n) if n not in keys(m) m[n] := fib(n − 1) + fib(n − 2) return m[n]Memoization var m := map(int, int) function fib(n) if n not in keys(m) m[n] := fib(n − 1) + fib(n − 2) return m[n]

14 Approximate Inference It’s expensive to work with the full joint distrbution… whether as a table or as a Bayesian Network Is approximation good enough? Monte Carlo It’s expensive to work with the full joint distrbution… whether as a table or as a Bayesian Network Is approximation good enough? Monte Carlo

15 Use samples to approximate posterior probs. Simulated annealing used Monte Carlo theories to justify why random guesses and sometimes going uphill can lead to optimalitySimulated annealing used Monte Carlo theories to justify why random guesses and sometimes going uphill can lead to optimality More samples = better approximation How many are needed?How many are needed? Where should you take the samples?Where should you take the samples? Use samples to approximate posterior probs. Simulated annealing used Monte Carlo theories to justify why random guesses and sometimes going uphill can lead to optimalitySimulated annealing used Monte Carlo theories to justify why random guesses and sometimes going uphill can lead to optimality More samples = better approximation How many are needed?How many are needed? Where should you take the samples?Where should you take the samples?

16 An example P(WetGrass) Requires full-joint dist.Requires full-joint dist. Full-joint is O(2 n )Full-joint is O(2 n ) Even unlikely events are tabulated in full-jointEven unlikely events are tabulated in full-jointP(WetGrass) Requires full-joint dist.Requires full-joint dist. Full-joint is O(2 n )Full-joint is O(2 n ) Even unlikely events are tabulated in full-jointEven unlikely events are tabulated in full-joint

17 Prior sampling An ability to model the prior probabilities of a set of random variables Imagine generating 100 of these samples An ability to model the prior probabilities of a set of random variables Imagine generating 100 of these samples

18 Prior sampling Define S PS (x 1, x 2, …, x n ) Probability that event (x 1, x 2, …, x n ) is generated by networkProbability that event (x 1, x 2, …, x n ) is generated by network Define S PS (x 1, x 2, …, x n ) Probability that event (x 1, x 2, …, x n ) is generated by networkProbability that event (x 1, x 2, …, x n ) is generated by network

19 Approximating true distribution With enough samples, perfect modeling is possible

20 Rejection sampling Compute P(X | e) Use PriorSample (S PS ) and create N samplesUse PriorSample (S PS ) and create N samples Inspect each sample for TRUTH of eInspect each sample for TRUTH of e Of those samples consistent with e, tabulate P(X|e)Of those samples consistent with e, tabulate P(X|e) –Keep track of X values –Normalize for total number of samples Compute P(X | e) Use PriorSample (S PS ) and create N samplesUse PriorSample (S PS ) and create N samples Inspect each sample for TRUTH of eInspect each sample for TRUTH of e Of those samples consistent with e, tabulate P(X|e)Of those samples consistent with e, tabulate P(X|e) –Keep track of X values –Normalize for total number of samples

21 Example P(Rain | Sprinkler = true)P(Rain | Sprinkler = true) Use Bayes Net to generate 100 samplesUse Bayes Net to generate 100 samples –Suppose 73 have Sprinkler=false –Suppose 27 have Sprinkler=true  8 have Rain=true  19 have Rain=false P(Rain | Sprinkler=true) = Normalize ( ) = P(Rain | Sprinkler=true) = Normalize ( ) = P(Rain | Sprinkler = true)P(Rain | Sprinkler = true) Use Bayes Net to generate 100 samplesUse Bayes Net to generate 100 samples –Suppose 73 have Sprinkler=false –Suppose 27 have Sprinkler=true  8 have Rain=true  19 have Rain=false P(Rain | Sprinkler=true) = Normalize ( ) = P(Rain | Sprinkler=true) = Normalize ( ) =

22 Problems with rejection sampling Standard deviation of the error in probability is proportional to 1/sqrt(n), where n is the number of samples consistent with evidenceStandard deviation of the error in probability is proportional to 1/sqrt(n), where n is the number of samples consistent with evidence As problems become complex, number of samples consistent with evidence becomes smallAs problems become complex, number of samples consistent with evidence becomes small Standard deviation of the error in probability is proportional to 1/sqrt(n), where n is the number of samples consistent with evidenceStandard deviation of the error in probability is proportional to 1/sqrt(n), where n is the number of samples consistent with evidence As problems become complex, number of samples consistent with evidence becomes smallAs problems become complex, number of samples consistent with evidence becomes small

23 Likelihood weighting We only want to generate samples that are consistent with the evidence, e We’ll sample the Bayesian Net, but we won’t let every random variable be sampled, some will be forced to produce a specific outputWe’ll sample the Bayesian Net, but we won’t let every random variable be sampled, some will be forced to produce a specific output We only want to generate samples that are consistent with the evidence, e We’ll sample the Bayesian Net, but we won’t let every random variable be sampled, some will be forced to produce a specific outputWe’ll sample the Bayesian Net, but we won’t let every random variable be sampled, some will be forced to produce a specific output

24 Example – likelihood weighting P (Rain | Sprinkler=true, WetGrass=true)

25 Example – likelihood weighting P (Rain | Sprinkler=true, WetGrass=true) First, weight vector, w, set to 1.0First, weight vector, w, set to 1.0 P (Rain | Sprinkler=true, WetGrass=true) First, weight vector, w, set to 1.0First, weight vector, w, set to 1.0

26 Example – likelihood weighting Keep track: (T, T, T, T) with weight 0.099 Notice that weight is reduced according to how likely an evidence variable’s output is given its parents So final probability is a function of what comes from sampling the free variables while constraining the evidence variablesSo final probability is a function of what comes from sampling the free variables while constraining the evidence variables Keep track: (T, T, T, T) with weight 0.099 Notice that weight is reduced according to how likely an evidence variable’s output is given its parents So final probability is a function of what comes from sampling the free variables while constraining the evidence variablesSo final probability is a function of what comes from sampling the free variables while constraining the evidence variables

27 Comparing techniques Likelihood uses all samplesLikelihood uses all samples –More efficient than rejection sampling Less effective if lots of evidence variables (small weights)Less effective if lots of evidence variables (small weights) Less effective if evidence is late in variable ordering (samples generated w/o early influence of evidence)Less effective if evidence is late in variable ordering (samples generated w/o early influence of evidence) Likelihood uses all samplesLikelihood uses all samples –More efficient than rejection sampling Less effective if lots of evidence variables (small weights)Less effective if lots of evidence variables (small weights) Less effective if evidence is late in variable ordering (samples generated w/o early influence of evidence)Less effective if evidence is late in variable ordering (samples generated w/o early influence of evidence)

28 Markov Chain Monte Carlo (MCMC) Imagine being in a current stateImagine being in a current state –An assignment to all the random variables The next state is selected according to random sample of one of the nonevidence variables, X iThe next state is selected according to random sample of one of the nonevidence variables, X i –Conditioned on the current values of the variables in the current state MCMC wanders around state space, flipping one variable at a time while keeping evidence variables fixedMCMC wanders around state space, flipping one variable at a time while keeping evidence variables fixed Imagine being in a current stateImagine being in a current state –An assignment to all the random variables The next state is selected according to random sample of one of the nonevidence variables, X iThe next state is selected according to random sample of one of the nonevidence variables, X i –Conditioned on the current values of the variables in the current state MCMC wanders around state space, flipping one variable at a time while keeping evidence variables fixedMCMC wanders around state space, flipping one variable at a time while keeping evidence variables fixed

29 Example - MCMC Solve P(Rain | Sprinkler=TRUE, WetGrass = TRUE) Fix Sprinkler and WetGrass to TRUEFix Sprinkler and WetGrass to TRUE Initialize “state” to [T Cloudy, T Sprinkler, F Rain, T WetGrass ]Initialize “state” to [T Cloudy, T Sprinkler, F Rain, T WetGrass ] Sample Cloudy, P(Cloudy | T Sprinkler, F Rain )Sample Cloudy, P(Cloudy | T Sprinkler, F Rain ) –We want to “flip the cloudy bit” subject to conditional probabilities of its parents, children, and childrens’ parents (Markov blanket) –Cloudy becomes False Solve P(Rain | Sprinkler=TRUE, WetGrass = TRUE) Fix Sprinkler and WetGrass to TRUEFix Sprinkler and WetGrass to TRUE Initialize “state” to [T Cloudy, T Sprinkler, F Rain, T WetGrass ]Initialize “state” to [T Cloudy, T Sprinkler, F Rain, T WetGrass ] Sample Cloudy, P(Cloudy | T Sprinkler, F Rain )Sample Cloudy, P(Cloudy | T Sprinkler, F Rain ) –We want to “flip the cloudy bit” subject to conditional probabilities of its parents, children, and childrens’ parents (Markov blanket) –Cloudy becomes False

30 Example - MCMC Solve P(Rain | Sprinkler=TRUE, WetGrass = TRUE) Fix Sprinkler and WetGrass to TRUEFix Sprinkler and WetGrass to TRUE “state” is [F Cloudy, T Sprinkler, F Rain, T WetGrass ]“state” is [F Cloudy, T Sprinkler, F Rain, T WetGrass ] Sample P(Rain | F Cloudy, T Sprinkler, T WetGrass )Sample P(Rain | F Cloudy, T Sprinkler, T WetGrass ) –Rain becomes TRUE “state” is [F Cloudy, T Sprinkler, T Rain, T WetGrass ]“state” is [F Cloudy, T Sprinkler, T Rain, T WetGrass ] Solve P(Rain | Sprinkler=TRUE, WetGrass = TRUE) Fix Sprinkler and WetGrass to TRUEFix Sprinkler and WetGrass to TRUE “state” is [F Cloudy, T Sprinkler, F Rain, T WetGrass ]“state” is [F Cloudy, T Sprinkler, F Rain, T WetGrass ] Sample P(Rain | F Cloudy, T Sprinkler, T WetGrass )Sample P(Rain | F Cloudy, T Sprinkler, T WetGrass ) –Rain becomes TRUE “state” is [F Cloudy, T Sprinkler, T Rain, T WetGrass ]“state” is [F Cloudy, T Sprinkler, T Rain, T WetGrass ]

31 Nice method of operation The sampling process settles into a “dynamic equilibrium” in which the long-run fraction of time spent in each state is exactly proportional to its posterior probabilityThe sampling process settles into a “dynamic equilibrium” in which the long-run fraction of time spent in each state is exactly proportional to its posterior probability Let q(x  x’) = probability of transitioning from state x to x’Let q(x  x’) = probability of transitioning from state x to x’ A Markov chain is a sequence of state transitions according to q( ) functionsA Markov chain is a sequence of state transitions according to q( ) functions  t (x) measures the probability of being in state x after t steps The sampling process settles into a “dynamic equilibrium” in which the long-run fraction of time spent in each state is exactly proportional to its posterior probabilityThe sampling process settles into a “dynamic equilibrium” in which the long-run fraction of time spent in each state is exactly proportional to its posterior probability Let q(x  x’) = probability of transitioning from state x to x’Let q(x  x’) = probability of transitioning from state x to x’ A Markov chain is a sequence of state transitions according to q( ) functionsA Markov chain is a sequence of state transitions according to q( ) functions  t (x) measures the probability of being in state x after t steps

32 Markov chains  t+1 (x’) = probability of being in x’ after t+1 steps If  t =  t+1 we have reached a stationary distributionIf  t =  t+1 we have reached a stationary distribution  t+1 (x’) = probability of being in x’ after t+1 steps If  t =  t+1 we have reached a stationary distributionIf  t =  t+1 we have reached a stationary distribution


Download ppt "CS 416 Artificial Intelligence Lecture 15 Uncertainty Chapter 14 Lecture 15 Uncertainty Chapter 14."

Similar presentations


Ads by Google