Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9.

Slides:



Advertisements
Similar presentations
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
Advertisements

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Belief networks Conditional independence Syntax and semantics Exact inference Approximate inference CS 460, Belief Networks1 Mundhenk and Itti Based.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Jim Little Nov (Textbook 6.3)
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Introduction of Probabilistic Reasoning and Bayesian Networks
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Causal and Bayesian Network (Chapter 2) Book: Bayesian Networks and Decision Graphs Author: Finn V. Jensen, Thomas D. Nielsen CSE 655 Probabilistic Reasoning.
3/19. Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables.
Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence.
M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) March, 17, 2010.
Bayesian network inference
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Summary Belief Networks zObjective: yProbabilistic knowledge base + Inference engine that computes xProb(formula | “all evidence collected so far”) zBelief.
Bayesian Belief Networks
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6 [P]: Reasoning Under Uncertainty Sections.
CPSC 322, Lecture 28Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Computer Science cpsc322, Lecture 28 (Textbook Chpt 6.3)
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian Networks Alan Ritter.
CPSC 322, Lecture 29Slide 1 Reasoning Under Uncertainty: Bnet Inference (Variable elimination) Computer Science cpsc322, Lecture 29 (Textbook Chpt 6.4)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
A Brief Introduction to Graphical Models
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Bayesian networks Chapter 14 Section 1 – 2. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Reasoning Under Uncertainty: Independence and Inference Jim Little Uncertainty 5 Nov 10, 2014 Textbook §6.3.1, 6.5, 6.5.1,
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Baye’s Rule.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
Generalizing Variable Elimination in Bayesian Networks 서울 시립대학원 전자 전기 컴퓨터 공학과 G 박민규.
Bayesian Nets and Applications. Naïve Bayes 2  What happens if we have more than one piece of evidence?  If we can assume conditional independence 
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Introduction on Graphic Models
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
Decision Making ECE457 Applied Artificial Intelligence Spring 2007 Lecture #10.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Bayesian Nets and Applications Next class: machine learning C. 18.1, 18.2 Homework due next class Questions on the homework? Prof. McKeown will not hold.
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #8
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #10
ECE457 Applied Artificial Intelligence Spring 2008 Lecture #10
Read R&N Ch Next lecture: Read R&N
Conditional Probability, Bayes’ Theorem, and Belief Networks
Artificial Intelligence Chapter 19
Read R&N Ch Next lecture: Read R&N
Bayesian Statistics and Belief Networks
Class #19 – Tuesday, November 3
Biointelligence Lab School of Computer Sci. & Eng.
Class #16 – Tuesday, October 26
Probabilistic Reasoning
Read R&N Ch Next lecture: Read R&N
Decision Making.
ECE457 Applied Artificial Intelligence Spring 2008 Lecture #8
Presentation transcript:

Probabilistic Reasoning ECE457 Applied Artificial Intelligence Spring 2007 Lecture #9

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 2 Outline Bayesian networks D-separation and independence Inference Russell & Norvig, sections 14.1 to 14.4

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 3 Recall the Story from FOL Anyone passing their 457 exam and winning the lottery is happy. Anyone who studies or is lucky can pass all their exams. Bob did not study but is lucky. Anyone who’s lucky can win the lottery. Is Bob happy?

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 4 Add Probabilities Anyone passing their 457 exam and winning the lottery has a 99% chance of being happy. Anyone only passing their 457 exam has an 80%, while someone only winning the lottery has a 60% chance of being happy, and someone who does neither has a 20% chance of being happy. Anyone who studies has a 90% chance of passing their exams. Anyone who’s lucky has a 50% chance of passing their exams. Anyone who’s both lucky and who studied has a 99% chance of passing, but someone who didn’t study and is unlucky has a 1% chance of passing. There’s a 20% chance that Bob studied, but a 75% chance that he’ll be lucky. Anyone who’s lucky has a 40% chance of winning the lottery, while an unlucky person only has a 1% chance of winning. What’s the probability of Bob being happy?

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 5 Probabilities in the Story Example of probabilities in the story P(Lucky) = 0.75 P(Study) = 0.2 P(PassExam|Study) = 0.9 P(PassExam|Lucky) = 0.5 P(Win|Lucky) = 0.4 P(Happy|PassExam,Win) = 0.99 Some variables directly affect others! Graphical representation of dependencies and conditional independencies between variables?

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 6 Bayesian Network Belief network Directed acyclic graph Nodes represent variables Edges represent conditional relationships Concise representation of any full joint probability distribution StudyPassExamLuckyWinHappy

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 7 Bayesian Network Nodes with no parents have prior probabilities Nodes with parents have conditional probability tables For all truth value combinations of their parents StudyPassExamLuckyWinHappy

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 8 Bayesian Network StudyPassExamLuckyWinHappy P(L) = 0.75P(S) = 0.2 LP(W) F0.01 P(W|  L) T0.4P(W|L) LSP(E) FF0.01 P(E|  L  S) TF0.5 P(E|L  S) FT0.9 P(E|  L  S) TT0.99 P(E|L  S) WEP(H) P(  H) FF TF FT TT

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 9 Bayesian Network abcdgfejhikmnlopqrstuvwyxz

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 10 Chain Rule Recall the chain rule P(A,B) = P(A|B)P(B) P(A,B,C) = P(A|B,C)P(B,C) P(A,B,C) = P(A|B,C)P(B|C)P(C) P(A 1,A 2,…,A n ) = P(A 1 |A 2,…,A n )P(A 2 |A 3,…,A n )…P(A n-1 |A n )P(A n ) P(A 1,A 2,…,A n ) =  i=1 n P(A i |A i+1,…,A n )

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 11 Chain Rule If we know the value of a node’s parents, we don’t care about more distant ancestors Their influence is included through the parents A node is conditionally independent of its predecessors given its parents Or more generally, a node is conditionally independent of its non-descendents given its parents Update chain rule P(A 1,A 2,…,A n ) =  i=1 n P(A i |parents(A i ))

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 12 Chain Rule Example Probability that Bob is happy because he won the lottery and passed his exam, because he’s lucky but did not study P(H,W,E,L,  S) = P(H|W  E) * P(W|L) * P(E|L  S) * P(L) * P(  S) P(H,W,E,L,  S) = 0.99 * 0.4 * 0.5 * 0.75 * 0.8 P(H,W,E,L,  S) = 0.12

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 13 Constructing Bayesian Nets Build from the top- down Start with root nodes Add children Go down to leaves StudyPassExamLuckyWinHappy

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 14 Constructing Bayesian Nets What happens if we build with the wrong order? Network becomes needlessly complicated Node ordering is important! StudyPassExamLuckyWinHappy

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 15 Connections We can understand dependence in a network by considering how evidence is transmitted through it Information entered at one node Propagates to descendents and ancestors through connected nodes Provided no node in path already has evidence (in which case we would stop the propagation)

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 16 Serial Connection Study and Happy are dependent Study and Happy are independent given PassExam Intuitively, the only way Study can affect Happy is through PassExam StudyPassExamLuckyWinHappy

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 17 Converging Connection Lucky and Study are independent Lucky and Study are dependent given PassExam Intuitively, Lucky can be used to explain away Study StudyPassExamLuckyWinHappy

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 18 Diverging Connection Win and PassExams are dependent Win and PassExams are independent given Lucky Intuitively, Lucky can explain both Win and PassExam. Win and PassExam can affect each other by changing the belief in Lucky StudyPassExamLuckyWinHappy

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 19 D-Separation Determine if two variables are independent given some other variables X is independent of Y given Z if X and Y are d- separate given Z X is d-separate from Y if, for all (undirected) paths between X and Y, there exists a node Z for which: The connection is serial or diverging and there is evidence for Z The connection is converging and there is no evidence for Z or any of its descendents

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 20 D-Separation X Z Blocks path if in evidence YX YX Z Blocks path if not in evidence Y Z 2 Blocks path if not in evidence

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 21 D-Separation Can be computed in linear time using depth-first-search algorithm Fast algorithm to know if two nodes are independent Allows us to infer whether learning the value of a variable might give us information about another variable given what we already know All d-separated variables are independent but not all independent variable are d- separated

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 22 D-Separation Exercise If we observe a value for node g, what other nodes are updated? Nodes f, h and i If we observe a value for node a, what other nodes are updated? Nodes b, c, d, e, f abcdefghij

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 23 D-Separation Exercise Given an observation of c, are nodes a and f independent? Yes Given an observation of i, are nodes g and j independent? No abcdefghij

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 24 Other Independence Criteria bcdghiknlopsuvwyx m A node is conditionally independent of its non- descendents given its parents Recall from updated chain rule z

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 25 Other Independence Criteria bcdghiknlopsuvwyx m A node is conditionally independent of all others in the network given its parents, children, and children’s parents Markov blanket z

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 26 Inference in Bayesian Network Compute the posterior probability of a query variable given an observed event P(A 1,A 2,…,A n ) =  i=1 n P(A i |parents(A i )) Observed evidence variables E = E 1,…,E m Query variable X Between them: nonevidence (hidden) variables Y = Y 1 …Y l Belief network is X  E  Y

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 27 Inference in Bayesian Network P(X|E) Recall Bayes’ Theorem: P(A|B) = P(A,B) / P(B) P(X|E) = α P(X,E) Recall marginalization: P(A i ) =  j P(A i,B j ) P(X|E) = α  Y P(X,E,Y) Recall chain rule: P(A 1,A 2,…,A n ) =  i=1 n P(A i |parents(A i )) P(X|E) = α  Y  A=X E P(A|parents(A))

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 28 Inference Example StudyPassExamLuckyWinHappy P(L) = 0.75P(S) = 0.2 LP(W) F0.01 T0.4 LSP(E) FF0.01 TF0.5 FT0.9 TT0.99 WEP(H) FF0.2 TF0.6 FT0.8 TT0.99

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 29 Inference Example #1 With only the information from the network (and no observations), what’s the probability that Bob won the lottery? P(W) =  l P(W,l) P(W) =  l P(W|l)P(l) P(W) = P(W|L)P(L) + P(W|  L)P(  L) P(W) = 0.4* *0.25 P(W) =

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 30 Inference Example #2 Given that we know that Bob is happy, what’s the probability that Bob won the lottery? From the network, we know P(h,e,w,s,l) = P(l)P(s)P(e|l,s)P(w|l)P(h|w,e) We want to find P(W|H) = α  l  s  e P(l)P(s)P(e|l,s)P(W|l)P(H|W,e) P(  W|H) also needed to normalize

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 31 Inference Example #2 lseP(s)P(l)P(e|l,s)P(W|l)P(H|W,e) FFF TFF FTF TTF FFT TFT FTT TTT P(W|H) = α

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 32 Inference Example #2 lseP(s)P(l)P(e|l,s) P(  W|l)P(H|  W,e) FFF TFF FTF TTF FFT TFT FTT TTT P(  W|H) = α

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 33 Inference Example #2 P(W|H) = α P(W|H) = Note that P(  W|H) > P(W|H) because P(  W|  L)  P(W|  L) The probability of Bob having won the lottery has increased by 13.1% thanks to our knowledge that he is happy!

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 34 Expert Systems Bayesian networks used to implement expert systems Diagnostic systems that contains subject-specific knowledge Knowledge (nodes, relationships, probabilities) typically provided by human experts System observes evidence by asking questions to user, then infers most likely conclusion

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 35 Pathfinder Expert system for medical diagnostic of lymph-node diseases Very large Bayesian network Over 60 diseases Over 100 features of lymph nodes Over 30 features for clinical information Lot of work from medical experts 8 hours to define features and diseases 35 hours to build network topology 40 hours to assess probabilities

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 36 Pathfinder One node for each disease Assumes the diseases are mutually exclusive and exhaustive Large domain, hard to handle Several small networks for diagnostic tasks built individually Then combined into a single large network

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 37 Pathfinder Testing the network 53 test cases (real diagnostics) Diagnostic accuracy as good as a medical expert

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 38 Assumptions Learning agent Environment Fully observable / Partially observable Deterministic / Strategic / Stochastic Sequential Static / Semi-dynamic Discrete / Continuous Single agent / Multi-agent

ECE457 Applied Artificial Intelligence R. Khoury (2007)Page 39 Assumptions Updated We can handle a new combination! Fully observable & Deterministic No uncertainty (map of Romania) Fully observable & Stochastic Games of chance (Monopoly, Backgammon) Partially observable & Deterministic Logic (Wumpus World) Partially observable & Stochastic