Introduction to Bayesian Belief Nets

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration Global Results.
Advertisements

University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Simplifications of Context-Free Grammars
Variations of the Turing Machine
3rd Annual Plex/2E Worldwide Users Conference 13A Batch Processing in 2E Jeffrey A. Welsh, STAR BASE Consulting, Inc. September 20, 2007.
1
Reinforcement Learning
Sequential Logic Design
Copyright © 2013 Elsevier Inc. All rights reserved.
Copyright © 2013 Elsevier Inc. All rights reserved.
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
CALENDAR.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Wants.
Date: File: PRO1_17E.1 SIMATIC S7 Siemens AG All rights reserved. Information and Training Center Knowledge for Automation Solutions (Version.
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.
Break Time Remaining 10:00.
The basics for simulations
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
EE, NCKU Tien-Hao Chang (Darby Chang)
Turing Machines.
PP Test Review Sections 6-1 to 6-6
Chapter 3 Logic Gates.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Regression with Panel Data
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Lilian Blot PART III: ITERATIONS Core Elements Autumn 2012 TPOP 1.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 Termination and shape-shifting heaps Byron Cook Microsoft Research, Cambridge Joint work with Josh Berdine, Dino Distefano, and.
Artificial Intelligence
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Before Between After.
7/16/08 1 New Mexico’s Indicator-based Information System for Public Health Data (NM-IBIS) Community Health Assessment Training July 16, 2008.
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
DTU Informatics Introduction to Medical Image Analysis Rasmus R. Paulsen DTU Informatics TexPoint fonts.
WorkKeys Internet Version Training
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Static Equilibrium; Elasticity and Fracture
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Physics for Scientists & Engineers, 3rd Edition
Introduction to Management Science
Select a time to count down from the clock above
Copyright Tim Morris/St Stephen's School
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Basics of Statistical Estimation
Aviation Management System 1 2  Silver Wings Aircraft Aviation Management System represents a functional “high – end” suite of integrated applications.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Bayesian Statistics and Belief Networks
Presentation transcript:

Introduction to Bayesian Belief Nets Russ Greiner Dep’t of Computing Science Alberta Ingenuity Centre for Machine Learning University of Alberta http://www.cs.ualberta.ca/~greiner/bn.html

   

Motivation Gates says [LATimes, 28/Oct/96]: Current Products Microsoft’s competitive advantages is its expertise in “Bayesian networks” Current Products Microsoft Pregnancy and Child Care (MSN) Answer Wizard (Office 95, Office 2000) Print Troubleshooter Excel Workbook Troubleshooter Office 95 Setup Media Troubleshooter Windows NT 4.0 Video Troubleshooter Word Mail Merge Troubleshooter

Motivation (II) US Army: SAIP (Battalion Detection from SAR, IR… GulfWar) NASA: Vista (DSS for Space Shuttle) GE: Gems (real-time monitor for utility generators) Intel: (infer possible processing problems from end-of-line tests on semiconductor chips) KIC: medical: sleep disorders, pathology, trauma care, hand and wrist evaluations, dermatology, home-based health evaluations DSS for capital equipment: locomotives, gas-turbine engines, office equipment

Motivation (III) Lymph-node pathology diagnosis Manufacturing control Software diagnosis Information retrieval Types of tasks Classification/Regression Sensor Fusion Prediction/Forecasting

Outline How to reason with BNs Specific Examples of BNs Existing uses of Belief Nets (BNs) How to reason with BNs Specific Examples of BNs Contrast with Rules, Neural Nets, … Possible applications of BNs Challenges How to reason efficiently How to learn BNs

Symptoms Signs Plan Diagnosis Chief complaint History, … Physical Exam Blah blah ouch yak ouch blah ouch blah blah ouch blah Symptoms Chief complaint History, … Signs Physical Exam Test results, … Diagnosis Plan Treatment, …

Objectives: Decision Support System Determine which tests to perform which repair to suggest based on costs, sensitivity/specificity, … Use all sources of information symbolic (discrete observations, history, …) signal (from sensors) Handle partial information Adapt to track fault distribution

Underlying Task Seldom Completely Certain Situation: Given observations {O1=v1, … Ok=vk} (symptoms, history, test results, …) what is best DIAGNOSIS Dxi for patient? Approach1: Use set of obs1 & … & obsm  Dxi rules but… Need rule for each situation for each diagnosis Dxr for each set of possible values vj for Oj for each subset of obs. {Ox1, Ox2, … }  {Oj} Can’t use if only know Temp and BP If Temp>100 & BP = High & Cough = Yes  DiseaseX Seldom Completely Certain

Underlying Task, II Approach 2: Compute Probabilities of Dxi Situation: Given observations {O1=v1, … Ok=vk} (symptoms, history, test results, …) what is best DIAGNOSIS Dxi for patient? Approach 2: Compute Probabilities of Dxi given observations { obsj } P( Dx = u | O1= v1, …, Ok= vk ) Challenge: How to express Probabilities?

How to deal with Probabilities Sufficient: “atomic events”: for all 21+N values u  {T, F}, vj {T, F} P( Dx = u, O1=v1,..., Ok= vk,…, ON=vN ) P( Dx=T, O1=T, O2=T, …, ON=T ) = 0.03 P( Dx=T, O1=T, O2=T, …, ON=F ) = 0.4 … P( Dx=T, O1=F, O2=F, … , ON=T ) = 0 P( Dx=F, O1=F, O2=F, …, ON=F ) = 0.01  Then: Marginalize: Conditionalize: P( Dx = u, O1= v1,…,Ok= vk ) = Σ P( Dx = u , O1= v1 , …, Ok= vk, …, ON= vN ) P( Dx = u | O1 = v1,…, Ok = vk) P( Dx = u, O1 = v1,…,Ok = vk ) P( O1 = v1,…,Ok = vk) But… even if binary Dx, 20 binary obs.’s.  >2,097,000 numbers!

Problems with “Atomic Events” Representation is not intuitive  Should make “connections” explicit use “local information” P(Jaundice | Hepatitis), P(LightDim | BadBattery),… Too many numbers – O(2N) Hard to store Hard to use [Must add 2r values to marginalize r variables] Hard to learn [Takes O(2N) samples to learn 2N parameters]  Include only necessary “connections” Belief Nets 

? Hepatitis? ? Hepatitis, not Jaunticed but +BloodTest ? Jaunticed BloodTest

Hepatitis Example (Boolean) Variables: Alternatively… H Hepatitis J Jaundice B (positive) Blood test (Boolean) Variables: Want P( H=1 | J=0, B=1 ) …, P(H=1 | B=1, J=1), P(H=1 | B=0,J=0), … Option 1: J B H P(J, B, H) 0 0 0 0.03395 0 0 1 0.0095 0 1 0 0.0003 0 1 1 0.1805 1 0 0 0.01455 0 1 0.038 1 0 0.00045 1 1 1 0.722 …Marginalize/Conditionalize, to get P( H=1 | J=0, B=1 ) … Alternatively…

Encoding Causal Links Simple Belief Net: Node ~ Variable 0.95 0.05 P(H=0) P(H=1) 0.97 0.03 1 P(B=0 | H=h) P(B=1 | H=h) h 0.7 0.3 0.2 P(J=0|h,b) 0.8 P(J=1|h,b) b H B J Node ~ Variable Link ~ “Causal dependency” “CPTable” ~ P(child | parents)

Encoding Causal Links P(H=1) 0.05 H h P(B=1 | H=h) 1 0.95 0.03 h b P(J=1|h , b ) 1 0.8 0.3 B J P(J | H, B=0) = P(J | H, B=1)  J, H !  P( J | H, B) = P(J | H) J is INDEPENDENT of B, once we know H Don’t need B J arc!

Encoding Causal Links P(H=1) 0.05 H h P(B=1 | H=h) 1 0.95 0.03 h P(J=1|h ) 1 0.8 0.3 B J P(J | H, B=0) = P(J | H, B=1)  J, H !  P( J | H, B) = P(J | H) J is INDEPENDENT of B, once we know H Don’t need B J arc!

Encoding Causal Links P(H=1) 0.05 H h P(B=1 | H=h) 1 0.95 0.03 h P(J=1|h ) 1 0.8 0.3 B J P(J | H, B=0) = P(J | H, B=1)  J, H !  P( J | H, B) = P(J | H) J is INDEPENDENT of B, once we know H Don’t need B J arc!

Sufficient Belief Net Requires: P(H=1) known H B J P(J=1 | H=1) known 0.05 H B J h P(B=1 | H=h) 1 0.95 0.03 h P(J=1|h ) 1 0.8 0.3 Requires: P(H=1) known P(J=1 | H=1) known P(B=1 | H=1) known (Only 5 parameters, not 7) Hence: P(H=1 | J=0, B=1) = P(H=1) P(J=0 | H=1) P(B=1 | J=0,H=1) P(B=1 | H=1)

“Factoring” B does depend on J: but… ONLY THROUGH H: If J=1, then likely that H=1  B =1 but… ONLY THROUGH H: If know H=1, then likely that B=1 … doesn’t matter whether J=1 or J=0 !  P(B=1 | J=0, H=1) = P(B=1 | H=1) N.b., B and J ARE correlated a priori P(B | J )  P(B) GIVEN H, they become uncorrelated P(B | J, H) = P(B | H)

Factored Distribution Symptoms independent, given Disease H Hepatitis J Jaundice B (positive) Blood test P( B | J )  P ( B ) but P( B | J,H ) = P ( B | H ) ReadingAbility and ShoeSize are dependent, P(ReadAbility | ShoeSize )  P(ReadAbility ) but become independent, given Age P(ReadAbility | ShoeSize, Age ) = P(ReadAbility | Age) Age ShoeSize Reading

“Naïve Bayes” Classification Task: Given Find argmax {hi} ... Given { O1 = v1, …, On = vn } Find hi that maximizes (H = hi | O1 = v1, …, On = vn) Given P(H = hi ) P(Oj = vj | H = hi) Independent: P(Oj | H, Ok,…) = P(Oj | H) H O2 O1 On ... Find argmax {hi}

Naïve Bayes (con’t) Easy to use for Classification H O2 O1 On ... Naïve Bayes (con’t) Normalizing term (No need to compute, as same for all hi) Easy to use for Classification Can use even if some vjs not specified If k Dx’s and n Ois, requires only k priors, n * k pairwise-conditionals (Not 2n+k… relatively easy to learn) 2,147,438,647 61 30 2,047 21 10 2n+1 – 1 1+2n n

Bigger Networks Jaundice Hepatitis Bloodtest P(I=1) 0.20 P(H=1) 0.32 LiverTrauma Jaundice GeneticPH Hepatitis Bloodtest d i P(H=1|d ,i ) 1 0.82 0.10 0.45 0.04 h P(J=1| h ) 1 0.8 0.3 h P(B=1| h ) 1 0.98 0.01 Intuition: Show CAUSAL connections: GeneticPH CAUSES Hepatitis; Hepatitis CAUSES Jaundice If GeneticPH, then expect Jaundice: GeneticPH  Hepatitis  Jaundice But only via Hepatitis: GeneticPH and not Hepatitis  Jaundice P( J | D )  P( J ) but P( J | D,H ) = P( J | H)

Belief Nets DAG structure v is INDEPENDENT of non-descendants, Each node  Variable v v depends (only) on its parents + conditional prob: P(vi | parenti = 0,1,… ) v is INDEPENDENT of non-descendants, given assignments to its parents Given H = 1, - D has no influence on J - J has no influence on B - etc. D I H J B

Less Trivial Situations N.b., obs1 is not always independent of obs2 given H Eg, FamilyHistoryDepression ‘causes’ MotherSuicide and Depression MotherSuicide causes Depression (w/ or w/o F.H.Depression) FHD MS D 0.001 P(FHD=1) 0.10 1 0.03 P(MS=1 | FHD=f) f 0.04 0.08 0.90 0.97 P(D=1 | FHD=f, MS=m) m Here, P( D | MS, FHD )  P( D | FHD ) ! Can be done using Belief Network, but need to specify: P( FHD ) 1 P( MS | FHD ) 2 P( D | MS, FHD ) 4

Example: Car Diagnosis

MammoNet

ALARM A Logical Alarm Reduction Mechanism 8 diagnoses, 16 findings, …

Troup Detection

ARCO1: Forecasting Oil Prices

ARCO1: Forecasting Oil Prices

Forecasting Potato Production

Warning System

Extensions Find best values (posterior distr.) for SEVERAL (> 1) “output” variables Partial specification of “input” values only subset of variables only “distribution” of each input variable General Variables Discrete, but domain > 2 Continuous (Gaussian: x = i bi yi for parents {Y} ) Decision Theory  Decision Nets (Influence Diagrams) Making Decisions, not just assigning prob’s Storing P( v | p1, p2,…,pk) General “CP Tables” 0(2k) Noisy-Or, Noisy-And, Noisy-Max “Decision Trees”

Outline Contrast with Rules, Neural Nets, … How to reason with BNs Existing uses of Belief Nets (BNs) How to reason with BNs Specific Examples of BNs Contrast with Rules, Neural Nets, … Possible applications of BNs Challenges How to reason efficiently How to learn BNs

Belief Nets vs Rules Both have “Locality” Specific clusters (rules / connected nodes) Often same nodes (rep’ning Propositions) but BN: Cause  Effect “Hep  Jaundice” P(J | H ) Rule: Effect  Cause “Jaundice  Hep” WHY?: Easier for people to reason CAUSALLY even if use is DIAGNOSTIC BN provide OPTIMAL way to deal with + Uncertainty + Vagueness (var not given, or only dist) + Error …Signals meeting Symbols … BN permits different “direction”s of inference

Belief Nets vs Neural Nets Both have “graph structure” but BN: Nodes have SEMANTICs Combination Rules: Sound Probability NN: Nodes: arbitrary Combination Rules: Arbitrary So harder to Initialize NN Explain NN (But perhaps easier to learn NN from examples only?) BNs can deal with Partial Information Different “direction”s of inference

Belief Nets vs Markov Nets Each uses “graph structure” to FACTOR a distribution … explicitly specify dependencies, implicitly independencies… but subtle differences… BNs capture “causality”, “hierarchies” MNs capture “temporality” C B A Technical: BNs use DIRECTRED arcs  allow “induced dependencies” I (A, {}, B) “A independent of B, given {}” ¬ I (A, C, B) “A dependent on B, given C” MNs use UNDIRECTED arcs  allow other independencies I(A, BC, D) A independent of D, given B, C I(B, AD, C) B independent of C, given A, D D C B A

Uses of Belief Nets #1 Medical Diagnosis: “Assist/Critique” MD identify diseases not ruled-out specify additional tests to perform suggest treatments appropriate/cost-effective react to MD’s proposed treatment Decision Support: Find/repair faults in complex machines [Device, or Manufacturing Plant, or …] … based on sensors, recorded info, history,… Preventative Maintenance: Anticipate problems in complex machines [Device, or Manufacturing Plant, or …] …based on sensors, statistics, recorded info, device history,…

Uses (con’t) Logistics Support: Stock warehouses appropriately …based on (estimated) freq. of needs, costs, Diagnose Software: Find most probable bugs, given program behavior, core dump, source code, … Part Inspection/Classification: … based on multiple sensors, background, model of production,… Information Retrieval: Combine information from various sources, based on info from various “agents”,… General: Partial Info, Sensor fusion -Classification -Interpretation -Prediction -…

Challenge #1 Computational Efficiency For given BN: General problem is Given Compute + If BN is “poly tree”,  efficient alg. - If BN is gen’l DAG (>1 path from X to Y) - NP-hard in theory - slow in practice Tricks: Get approximate answer (quickly) + Use abstraction of BN + Use “abstraction” of query (range) O1 = v1, …, On = vn D I P(H | O1 = v1, …, On = vn) H J B

# 2a:Obtaining Accurate BN BN encodes distribution over n variables Not O(2n) values, but “only” i 2k_i (Node ni binary, with ki parents) Still lots of values! …structure ..  Qualitative Information Structure: “What depends on what?” Easy for people (background knowledge) But NP-hard to learn from samples…  Quantitative Information Actual CP-tables Easy to learn, given lots of examples. But people have hard time… Knowledge acquisition: from human experts Simple learning algorithm

Notes on Learning Mixed Sources: Person provides structure; Algorithm fills-in numbers. Just Learning Algorithm:  algorithms that learn from sample structure values Just Human Expert: People produce CP-table, as well as structure Relatively few values really required Esp. if NoisyOr, NoisyAnd, NaiveBayes, … Actual values not that important …Sensitivity studies

My Current Work Learning Belief Nets Validating Belief Nets Model selection: Challenging myth that MDL is appropriate criteria Learning “performance system”, not model Validating Belief Nets “Error bars” around answers Adaptive User Interfaces Efficient Vision Systems Foundations of Learnability Learning Active Classifiers Sequential learners Condition Based maintenance, Bio-signal interpretation, …

# 2b: Maintaining Accurate BN The world changes. Information in BN* may be perfect at time t sub-optimal at time t + 20 worthless at time t + 200 Need to MAINTAIN a BN over time using on-going human consultant Adaptive BN Dirichlet distribution (variables) Priors over BNs

Conclusions Belief Nets are PROVEN TECHNOLOGY Provide effective way to Medical Diagnosis DSS for complex machines Forecasting, Modeling, InfoRetrieval… Provide effective way to Represent complicated, inter-related events Reason about such situations Diagnosis, Explanation, ValueOfInfo Explain conclusions Mix Symbolic and Numeric observations Challenges Efficient ways to use BNs Ways to create BNs Ways to maintain BNs Reason about time

Extra Slides AI Seminar References Crusher Controller Formal Framework Friday, noon, CSC3-33 Free PIZZA! http://www.cs.ualberta.ca/~ai/ai-seminar.html References http://www.cs.ualberta.ca/~greiner/bn.html Crusher Controller Formal Framework Decision Nets Developing the Model Why Reasoning is Hard Learning Accurate Belief Nets

References http://www.cs.ualberta.ca/~greiner/bn.html Overview textbooks: Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988. Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, Prentice Hall, 1995. (See esp Ch 14, 15, 19.) General info re BayesNets http://www.afit.af.mil:80/Schools/EN/ENG/LABS/AI/BayesianNetworks Proceedings: http://www.sis.pitt.edu/~dsl/uai.html Assoc for Uncertainty in AI http://www.auai.org/ Learning: David Heckerman, A tutorial on learning with Bayesian networks, 1995, http://www.research.microsoft.com/research/dtg/heckerma/TR-95-06.htm Software: General: http://bayes.stat.washington.edu/almond/belief.html JavaBayes http://www.cs.cmu.edu/~fgcozman/Research/JavaBaye Norsys http://www.norsys.com/

Decision Net: Test/Buy a Car

Utility: Decision Nets Given c( action, state)  R (cost function) Cp(a) = Es[ c(a,s) ] =  sS p(s | obs) * c(a, s) Best (immediate) action: a* = argmina A {Cp(a) } Decision Net (like Belief Net) but… 3 types of nodes chance (like Belief net) action – repair, sensing cost/utility Links for “dependency” Given observations, obs, computes best action, a* Sequence of Actions: MDPs, POMDPs, … Go Back

Decision Net: Drill for Oil? Go Back

Formal Framework Always true: Given independencies, Hence P(x1, …,xn) = P(x1) P(x2 | x1) P (x3 | x2, x1) … P (xn | xn-1,…,x1) Given independencies, P(xk | x1,…,xk-1) = P (xk | pak) for some pak {x1, …, xk-1} Hence So just connect each y  pai to xi…  DAG structure Note: -Size of BN is . so better to use small pai. -pai = {1,…,i – 1} is never incorrect … but seldom min’l… (so hard to store, learn, reason with,…) - Order of variables can make HUGE difference Can have |pai| = 1 for one ordering |pai| =i– 1 for another Go Back

Developing the Model Source of information Typical Process + (Human) Expert (s) + Data from earlier Runs + Simulator Typical Process 1. Develop / Refine Initial Prototype 2. Test Prototype ↦ Accurate System 3. Deploy System 4. Update / Maintain System

Develop/Refine Prototype Requires expert useful to have data Initial Interview(s): To establish “what relates to what” Expert time: ≈ ½ - day Iterative process: (Gradual refinement) To refine qualitative connections To establish correct operations Expert presents “Good Performance” KE implements Expert’s claims KE tests on examples (real data or expert), and reports to Expert Expert time: ≈ 1 – 2 hours / week for ?? Weeks (Depends on complexity of device, and accuracy of model) Go Back

Why Reasoning is Hard BN reasoning may look easy: Just “propagate” information from node to node P(Z=t) 0.5 Z z P(B=t|Z=z) t 0.0 f 1.0 z P(A=t|Z=z) t 1.0 f 0.0 A B a b P(C=t|a,b) t 1.0 f 0.0 C Challenge: What is P(C=t)? A = Z = ¬B P ( A = t ) = P ( B = f ) = ½ So… ? P ( C = t ) = P ( A = t, B = t) = P ( A = t) * P( B = t) = ½ * ½ = ¼ Wrong: P ( C = t ) = 0 ! Need to maintain dependencies! P ( A = t, B = t ) = P ( A = t ) * P ( B = t | A = t) Go Back

Crusher Controller Given observations Specify best action for crusher History, sensor readings, schedule, … Specify best action for crusher “stop immediately”, “increase roller speed by ” Best == minimize expected cost … Initially: just recommendation to human operator Later: Directly implement (some) actions ?Request values of other sensors?

Approach For each state s (“Good flow”, “tooth about to enter”, …) for each action a (“Stop immediately”, “Change p7 += 0.32”, …) determine utility of performing a in s (Cost of lost production if stopped; … of reduced production efficient if continue; …) Use observations to estimate (dist over) current states Infer EXPECTED UTILITY of each action, based on distr. Return action with highest Expected Utility

Details State “CrusherEnvironment” Inputs Outputs Sensor Readings (history) Camera, microphone, power-draw Parameter settings Log files, Maintenance records Schedule (maintenance, anticipated load, …) Outputs Continue as is Adjust parameters GapSize, ApronFeederSpeed, 1J_ConveyorSpeed Shut down immediately Step adding new material Tell operator to look State “CrusherEnvironment” #UncrushableThingsNowInCrusher #TeethMissing NextUncrushableEntry Control Parameters

Benefits Increase Crusher Effectiveness Reduce Down Time Find best settings for parameters To maximize production of well-sized chunks Reduce Down Time Know when maintain/repair is critical Reduce Damage to Crusher Usable Model of Crusher Easy to modify when needed Training Design of next generation Prototype for design of {control, diagnostician} of other machines Go Back

My Background Prof at UofAlberta (CS) PhD, Stanford (Computer Science) Representational issues, Analogical Inference … everything in Logic PostDoc at UofToronto (CS) Foundations of learnability, logical inference, DB, control theory, … Industrial research (Siemens Corporate Research) Need to solve REAL problems Theory Revision, Navigational systems, … …logic is not be-all-and-end-all! Prof at UofAlberta (CS) Industrial problems (Siemens, BioTools, Syncrude) Foundations of learnability, probabilistic inference …

Less Trivial Situations N.b., obs1 is not always independent of obs2 given H Eg, FamilyHistoryDepression ‘causes’ MotherSuicide and Depression MotherSuicide causes Depression (w/ or w/o F.H.Depression) 0.001 P(FHD=1) FHD f P(MS=1 | FHD=f) 1 0.10 0.03 MS f m P(D=1 | FHD=f, MS=m) 1 0.97 0.90 0.08 0.04 D Here, P( D | MS, FHD )  P( D | FHD ) ! Can be done using Belief Network, but need to specify: P( FHD ) 1 P( MS | FHD ) 2 P( D | MS, FHD ) 4