Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bayesian Belief Nets Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta

Similar presentations


Presentation on theme: "Introduction to Bayesian Belief Nets Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta"— Presentation transcript:

1

2 Introduction to Bayesian Belief Nets Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta

3 2

4 3 Motivation Gates says [LATimes, 28/Oct/96]: Microsoft’s competitive advantages is its expertise in “Bayesian networks” Current Products Microsoft Pregnancy and Child Care (MSN) Answer Wizard (Office 95, Office 2000) Print Troubleshooter Excel Workbook Troubleshooter Office 95 Setup Media Troubleshooter Windows NT 4.0 Video Troubleshooter Word Mail Merge Troubleshooter

5 4 Motivation (II) US Army: SAIP ( Battalion Detection from SAR, IR… GulfWar) NASA: Vista (DSS for Space Shuttle) GE: Gems (real-time monitor for utility generators) Intel: (infer possible processing problems from end-of-line tests on semiconductor chips) KIC: medical: sleep disorders, pathology, trauma care, hand and wrist evaluations, dermatology, home-based health evaluations DSS for capital equipment : locomotives, gas-turbine engines, office equipment

6 5 Motivation (III) Lymph-node pathology diagnosis Manufacturing control Software diagnosis Information retrieval Types of tasks Classification/Regression Sensor Fusion Prediction/Forecasting

7 6 Outline Existing uses of Belief Nets (BNs) How to reason with BNs Specific Examples of BNs Contrast with Rules, Neural Nets, … Possible applications of BNs Challenges How to reason efficiently How to learn BNs

8 7 Blah blah ouch yak ouch blah blah ouch blah Symptoms Chief complaint History, … Signs Physical Exam Test results, … Diagnosis Plan Treatment, …

9 8 Objectives: Decision Support System Determine which tests to perform which repair to suggest based on costs, sensitivity/specificity, … Use all sources of information symbolic (discrete observations, history, …) signal (from sensors) Handle partial information Adapt to track fault distribution

10 9 Underlying Task Situation: Given observations {O 1 =v 1, … O k =v k } (symptoms, history, test results, …) what is best DIAGNOSIS Dx i for patient? § Approach1 : § Approach1 : Use set of obs 1 & … & obs m  Dx i rules § Seldom Completely Certain but… Need rule for each situation l for each diagnosis Dx r l for each set of possible values v j for O j l for each subset of obs. {O x1, O x2, … }  {O j } Can’t use if only know Temp and BP If Temp>100 & BP = High & Cough = Yes  DiseaseX

11 10 Underlying Task, II Situation: Given observations {O 1 =v 1, … O k =v k } (symptoms, history, test results, …) what is best DIAGNOSIS Dx i for patient? § Challenge: How to express Probabilities? § Approach 2 § Approach 2: Compute Probabilities of Dx i given observations { obs j } P( Dx = u | O 1 = v 1, …, O k = v k )

12 11 But… even if binary Dx, 20 binary obs.’s.  >2,097,000 numbers! P( Dx=T, O 1 =T, O 2 =T, …, O N =T ) = 0.03 P( Dx=T, O 1 =T, O 2 =T, …, O N =F ) = 0.4 … P( Dx=T, O 1 =F, O 2 =F, …, O N =T ) = 0 … P( Dx=F, O 1 =F, O 2 =F, …, O N =F ) = 0.01  Then: Marginalize: Conditionalize: P( Dx = u, O 1 = v 1,…,O k = v k ) = Σ P( Dx = u, O 1 = v 1, …, O k = v k, …, O N = v N ) P( Dx = u | O 1 = v 1,…, O k = v k ) P( Dx = u, O 1 = v 1,…,O k = v k ) P( O 1 = v 1,…,O k = v k ) P( Dx = u, O 1 =v 1,..., O k = v k,…, O N =v N ) Sufficient: “atomic events”: for all 2 1+N values u  {T, F}, v j  {T, F} How to deal with Probabilities

13 12 Problems with “Atomic Events” Representation is not intuitive  Should make “connections” explicit use “local information” § Too many numbers – O(2 N ) l Hard to store l Hard to use [Must add 2 r values to marginalize r variables] l Hard to learn [Takes O(2 N ) samples to learn 2 N parameters]  Include only necessary “connections” Belief Nets  P(Jaundice | Hepatitis), P(LightDim | BadBattery),…

14 13 ? Hepatitis? ? Hepatitis, not Jaunticed but +BloodTest ? Jaunticed BloodTest

15 14 Hepatitis Example (Boolean) Variables: H Hepatitis J Jaundice B (positive) Blood test WantP( H=1 | J=0, B=1 ) …, P(H=1 | B=1, J=1), P(H=1 | B=0,J=0), … Alternatively… Option 1: JBH P(J, B, H) …Marginalize/Conditionalize, to get P( H=1 | J=0, B=1 ) …

16 15 Encoding Causal Links Simple Belief Net: P(H=0)P(H=1) P(B=0 | H=h)P(B=1 | H=h) h P(J=0|h,b) P(J=1|h,b) bh H B J §Node ~ Variable Link ~ “Causal dependency” §“CPTable” ~ P(child | parents)

17 16 H B J §P(J | H, B=0) = P(J | H, B=1)  J, H !  P( J | H, B) = P(J | H) § J is INDEPENDENT of B, once we know H §Don’t need B  J arc! hP(B=1 | H=h) P(H=1) 0.05 hbP(J=1|h, b ) Encoding Causal Links

18 17 H B J §P(J | H, B=0) = P(J | H, B=1)  J, H !  P( J | H, B) = P(J | H) § J is INDEPENDENT of B, once we know H §Don’t need B  J arc! hP(B=1 | H=h) P(H=1) 0.05 hP(J=1|h ) Encoding Causal Links

19 18 H B J §P(J | H, B=0) = P(J | H, B=1)  J, H !  P( J | H, B) = P(J | H) § J is INDEPENDENT of B, once we know H §Don’t need B  J arc! hP(B=1 | H=h) P(H=1) 0.05 hP(J=1|h ) Encoding Causal Links

20 19 Sufficient Belief Net H B J hP(B=1 | H=h) P(H=1) 0.05 hP(J=1|h ) § Requires: P(H=1) known P(J=1 | H=1)known P(B=1 | H=1)known (Only 5 parameters, not 7) Hence: P(H=1 | J=0, B=1) = P(H=1) P(J=0 | H=1) P(B=1 | J=0,H=1) P(B=1 | H=1)

21 20 “Factoring” § B does depend on J: If J=1, then likely that H=1  B =1 §but… ONLY THROUGH H:  If know H=1, then likely that B=1  … doesn’t matter whether J=1 or J=0 !  P(B=1 | J=0, H=1) = P(B=1 | H=1) N.b., B and J ARE correlated a priori P(B | J )  P(B) GIVEN H, they become uncorrelated P(B | J, H) = P(B | H) H B J

22 21 Factored Distribution Symptoms independent, given Disease § ReadingAbility and ShoeSize are dependent, P(ReadAbility | ShoeSize )  P(ReadAbility ) but become independent, given Age P(ReadAbility | ShoeSize, Age ) = P(ReadAbility | Age) H Hepatitis J Jaundice B (positive) Blood test P( B | J )  P ( B ) but P( B | J,H ) =P ( B | H ) Age ShoeSize Reading

23 22 § Find argmax {h i } § Given P(H = h i ) P(O j = v j | H = h i ) Independent: P(O j | H, O k,…) = P(O j | H) H O2O2 O1O1 OnOn... Classification Task: Given { O 1 = v 1, …, O n = v n } Find h i that maximizes (H = h i | O 1 = v 1, …, O n = v n ) “Naïve Bayes”

24 23 Naïve Bayes (con’t) Normalizing term (No need to compute, as same for all h i ) § Easy to use for Classification § Can use even if some v j s not specified § If k Dx’s and n O i s, requires only k priors, n * k pairwise-conditionals (Not 2 n+k … relatively easy to learn) 2,147,438, , n+1 – 11+2nn H O2O2 O1O1 OnOn...

25 24 Bigger Networks Intuition: Show CAUSAL connections: GeneticPH CAUSES Hepatitis; Hepatitis CAUSES Jaundice But only via Hepatitis: GeneticPH and not Hepatitis  Jaundice P( J | D )  P( J ) but P( J | D,H ) = P( J | H) hP(J=1| h ) hP(B=1| h ) diP(H=1|d,i ) § If GeneticPH, then expect Jaundice: GeneticPH  Hepatitis  Jaundice LiverTrauma Jaundice GeneticPH Hepatitis Bloodtest P(I=1) 0.20 P(H=1) 0.32

26 25 Belief Nets DAG structure Each node  Variable v v depends (only) on its parents + conditional prob: P(v i | parent i =  0,1,…  ) § v is INDEPENDENT of non-descendants, given assignments to its parents Given H = 1, - D has no influence on J - J has no influence on B - etc. DI H J B

27 26 Less Trivial Situations N.b., obs 1 is not always independent of obs 2 given H Eg, FamilyHistoryDepression ‘causes’ MotherSuicide and Depression MotherSuicide causes Depression (w/ or w/o F.H.Depression) Here, P( D | MS, FHD )  P( D | FHD ) ! § Can be done using Belief Network, but need to specify: P( FHD )1 P( MS | FHD )2 P( D | MS, FHD )4 FHD MS D P(FHD=1) P(MS=1 | FHD=f)f P(D=1 | FHD=f, MS=m)mf

28 27 Example: Car Diagnosis

29 28 MammoNet

30 29 ALARM A Logical Alarm Reduction Mechanism 8 diagnoses, 16 findings, …

31 30 Troup Detection

32 31 ARCO1: Forecasting Oil Prices

33 32 ARCO1: Forecasting Oil Prices

34 33 Forecasting Potato Production

35 34 Warning System

36 35 Extensions Find best values (posterior distr.) for SEVERAL (> 1) “output” variables Partial specification of “input” values only subset of variables only “distribution” of each input variable General Variables Discrete, but domain > 2 Continuous (Gaussian: x =  i b i y i for parents {Y} ) Decision Theory  Decision Nets ( Influence Diagrams) Making Decisions, not just assigning prob’sDecision Nets Storing P ( v | p 1, p 2,…,p k ) General “CP Tables” 0(2 k ) Noisy-Or, Noisy-And, Noisy-Max “Decision Trees”

37 36 Outline Existing uses of Belief Nets (BNs) How to reason with BNs Specific Examples of BNs Contrast with Rules, Neural Nets, … Possible applications of BNs Challenges How to reason efficiently How to learn BNs

38 37 Belief Nets vs Rules Both have “Locality” Specific clusters (rules / connected nodes) WHY?: Easier for people to reason CAUSALLY even if use is DIAGNOSTIC §BN provide OPTIMAL way to deal with + Uncertainty + Vagueness (var not given, or only dist) + Error …Signals meeting Symbols … §BN permits different “direction”s of inference § Often same nodes (rep’ning Propositions) but BN:Cause  Effect “Hep  Jaundice”P(J | H ) Rule:Effect  Cause “Jaundice  Hep”

39 38 Belief Nets vs Neural Nets Both have “graph structure” but  So harder to § Initialize NN § Explain NN (But perhaps easier to learn NN from examples only?)  BNs can deal with §Partial Information §Different “direction”s of inference BN: Nodes have SEMANTICs Combination Rules: Sound Probability NN: Nodes: arbitrary Combination Rules: Arbitrary

40 39 Belief Nets vs Markov Nets Each uses “graph structure” to FACTOR a distribution … explicitly specify dependencies, implicitly independencies… § but subtle differences… §BNs capture “causality”, “hierarchies” §MNs capture “temporality” C BA Technical: BNs use DIRECTRED arcs  allow “induced dependencies” I (A, {}, B)“A independent of B, given {}” ¬ I (A, C, B) “A dependent on B, given C” MNs use UNDIRECTED arcs  allow other independencies I(A, BC, D) A independent of D, given B, C I(B, AD, C) B independent of C, given A, D D CB A

41 40 Uses of Belief Nets #1 Medical Diagnosis: “Assist/Critique” MD §identify diseases not ruled-out §specify additional tests to perform §suggest treatments appropriate/cost-effective §react to MD’s proposed treatment § Decision Support: Find/repair faults in complex machines [Device, or Manufacturing Plant, or …] … based on sensors, recorded info, history,… § Preventative Maintenance: Anticipate problems in complex machines [Device, or Manufacturing Plant, or …] …based on sensors, statistics, recorded info, device history,…

42 41 Uses (con’t)  Logistics Support : Stock warehouses appropriately …based on (estimated) freq. of needs, costs, Diagnose Software : Find most probable bugs, given program behavior, core dump, source code, … §Part Inspection/Classification: … based on multiple sensors, background, model of production,… §Information Retrieval: Combine information from various sources, based on info from various “agents”,… General: Partial Info, Sensor fusion -Classification-Interpretation -Prediction-…

43 42 Challenge #1 Computational Efficiency For given BN: General problem is Given Compute + If BN is “poly tree”,  efficient alg. - If BN is gen’l DAG (>1 path from X to Y) - NP-hard in theory - slow in practice Tricks: Get approximate answer (quickly) + Use abstraction of BN + Use “abstraction” of query (range) O 1 = v 1, …, O n = v n P(H | O 1 = v 1, …, O n = v n ) D I H J B

44 43 # 2a:Obtaining Accurate BN BN encodes distribution over n variables Not O(2 n ) values, but “only”  i 2 k_i (Node n i binary, with k i parents) Still lots of values! …structure..  Qualitative Information Structure: “What depends on what?” Easy for people (background knowledge) But NP-hard to learn from samples…  Quantitative Information Actual CP-tables Easy to learn, given lots of examples. But people have hard time… Knowledge acquisition: from human experts Simple learning algorithm

45 44 Notes on Learning Mixed Sources: Person provides structure; Algorithm fills-in numbers. §Just Learning Algorithm:  algorithms that learn from sample structure values §Just Human Expert : People produce CP-table, as well as structure Relatively few values really required Esp. if NoisyOr, NoisyAnd, NaiveBayes, … Actual values not that important …Sensitivity studies

46 45 My Current Work Learning Belief Nets Model selection: Challenging myth that MDL is appropriate criteria Learning “performance system”, not model Validating Belief Nets “Error bars” around answers Adaptive User Interfaces Efficient Vision Systems Foundations of Learnability Learning Active Classifiers Sequential learners Condition Based maintenance, Bio-signal interpretation, …

47 46 # 2b: Maintaining Accurate BN § The world changes. Information in BN * may be § perfect at time t § sub-optimal at time t + 20 § worthless at time t § Need to MAINTAIN a BN over time using on-going human consultant § Adaptive BN §Dirichlet distribution (variables) § Priors over BNs

48 47 Conclusions § Belief Nets are PROVEN TECHNOLOGY l Medical Diagnosis l DSS for complex machines l Forecasting, Modeling, InfoRetrieval… § Provide effective way to §Represent complicated, inter-related events §Reason about such situations Diagnosis, Explanation, ValueOfInfo Explain conclusions Mix Symbolic and Numeric observations § Challenges §Efficient ways to use BNs §Ways to create BNs §Ways to maintain BNs §Reason about time

49 48 Extra Slides AI Seminar Friday, noon, CSC3-33 Free PIZZA! References Crusher Controller Formal Framework Decision Nets Developing the Model Why Reasoning is Hard Learning Accurate Belief Nets

50 49 References Overview textbooks: Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, (See esp Ch 14, 15, 19.) General info re BayesNets Proceedings: Assoc for Uncertainty in AI Learning: David Heckerman, A tutorial on learning with Bayesian networks, 1995, Software: General: JavaBayeshttp://www.cs.cmu.edu/~fgcozman/Research/JavaBayehttp://www.cs.cmu.edu/~fgcozman/Research/JavaBaye Norsyshttp://www.norsys.com/http://www.norsys.com/

51 50 Decision Net: Test/Buy a Car

52 51 Utility: Decision Nets Given c( action, state)  R (cost function) C p (a) = E s [ c(a,s) ] =  s  S p(s | obs) * c(a, s) Best (immediate) action: a * = argmin a  A { C p (a) } Decision Net (like Belief Net) but… 3 types of nodes chance (like Belief net) action – repair, sensing cost/utility Links for “dependency” Given observations, obs, computes best action, a * Sequence of Actions: MDPs, POMDPs, … Go Back

53 52 Decision Net: Drill for Oil? Go Back

54 53 Formal Framework §Always true: P(x 1, …,x n ) = P(x 1 ) P(x 2 | x 1 ) P (x 3 | x 2, x 1 ) … P (x n | x n-1,…,x 1 ) § Given independencies, P(x k | x 1,…,x k-1 ) = P (x k | pa k ) for some pa k  {x 1, …, x k-1 } § Hence So just connect each y  pa i to x i …  DAG structure. Note:-Size of BN is so better to use small pa i. -pa i = {1,…,i – 1} is never incorrect … but seldom min’l… (so hard to store, learn, reason with,…) - Order of variables can make HUGE difference Can have |pa i | = 1 for one ordering |pa i | =i– 1 for another Go Back

55 54 Developing the Model §Source of information + (Human) Expert (s) + Data from earlier Runs + Simulator §Typical Process 1. Develop / Refine Initial Prototype 2. Test Prototype ↦ Accurate System 3. Deploy System 4. Update / Maintain System

56 55 Develop/Refine Prototype §Requires expert useful to have data §Initial Interview(s): To establish “what relates to what” Expert time: ≈ ½ - day §Iterative process: (Gradual refinement) To refine qualitative connections To establish correct operations Expert presents “Good Performance” KE implements Expert’s claims KE tests on examples (real data or expert), and reports to Expert Expert time: ≈ 1 – 2 hours / week for ?? Weeks (Depends on complexity of device, and accuracy of model) Go Back

57 56 Why Reasoning is Hard §BN reasoning may look easy: Just “propagate” information from node to node Z BA C § Challenge: What is P(C=t)? A = Z = ¬BP ( A = t ) = P ( B = f ) = ½ So… ? P ( C = t ) = P ( A = t, B = t) = P ( A = t) * P( B = t) = ½ * ½ = ¼ Wrong: P ( C = t ) = 0 ! § Need to maintain dependencies! P ( A = t, B = t ) = P ( A = t ) * P ( B = t | A = t) z P(A=t|Z=z ) t1.0 f0.0 z P(B=t|Z=z ) t0.0 f1.0 a bP(C=t|a,b ) tt1.0 tf0.0 ft ff P(Z=t) 0.5 Go Back

58 57 Crusher Controller Given observations History, sensor readings, schedule, … Specify best action for crusher “stop immediately”, “increase roller speed by  ” Best == minimize expected cost … §Initially: just recommendation to human operator §Later: Directly implement (some) actions ?Request values of other sensors?

59 58 Approach 1. For each state s (“Good flow”, “tooth about to enter”, …) for each action a (“Stop immediately”, “Change p 7 += 0.32”, …) determine utility of performing a in s (Cost of lost production if stopped; … of reduced production efficient if continue; …) 2. Use observations to estimate (dist over) current states Infer EXPECTED UTILITY of each action, based on distr. 3. Return action with highest Expected Utility

60 59 Details Inputs Sensor Readings (history) Camera, microphone, power-draw Parameter settings Log files, Maintenance records Schedule (maintenance, anticipated load, …) Outputs Continue as is Adjust parameters GapSize, ApronFeederSpeed, 1J_ConveyorSpeed Shut down immediately Step adding new material Tell operator to look State “CrusherEnvironment” #UncrushableThingsNowInCrusher #TeethMissing NextUncrushableEntry Control Parameters

61 60 Benefits Increase Crusher Effectiveness Find best settings for parameters To maximize production of well-sized chunks Reduce Down Time Know when maintain/repair is critical Reduce Damage to Crusher Usable Model of Crusher Easy to modify when needed Training Design of next generation Prototype for design of {control, diagnostician} of other machines Go Back

62 61 My Background PhD, Stanford (Computer Science) Representational issues, Analogical Inference … everything in Logic PostDoc at UofToronto (CS) Foundations of learnability, logical inference, DB, control theory, … … everything in Logic Industrial research ( Siemens Corporate Research) Need to solve REAL problems Theory Revision, Navigational systems, … …logic is not be-all-and-end-all! Prof at UofAlberta (CS) Industrial problems (Siemens, BioTools, Syncrude) Foundations of learnability, probabilistic inference …

63 62 Less Trivial Situations N.b., obs 1 is not always independent of obs 2 given H Eg, FamilyHistoryDepression ‘causes’ MotherSuicide and Depression MotherSuicide causes Depression (w/ or w/o F.H.Depression) Here, P( D | MS, FHD )  P( D | FHD ) ! § Can be done using Belief Network, but need to specify: P( FHD )1 P( MS | FHD )2 P( D | MS, FHD )4 FHD MS D P(FHD=1) fP(MS=1 | FHD=f) fmP(D=1 | FHD=f, MS=m)


Download ppt "Introduction to Bayesian Belief Nets Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta"

Similar presentations


Ads by Google