Presentation is loading. Please wait.

Presentation is loading. Please wait.

George Mason University BDT Workshop - 1 - GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Bayesian Decision.

Similar presentations


Presentation on theme: "George Mason University BDT Workshop - 1 - GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Bayesian Decision."— Presentation transcript:

1 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Bayesian Decision Theory Instructor: Kathryn Blackmond Laskey George Mason University Department of Systems Engineering and Operations Research

2 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 This workshop is dedicated to the memory of journalist Danny Pearl, brutally murdered in Pakistan in February 2002, and to the pioneering research of his father Judea Pearl. Judea Pearl’s research has the potential to create unprecedented advances in our ability to anticipate and prevent future terrorist incidents. A portion of any compensation derived from this workshop will be donated to the foundation to honor Danny Pearl: Pearl Family Foundation: c/o Wall Street Journal P.O. Box 300 Princeton, New Jersey 08543

3 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Software for Bayesian Networks and Decision Graphs A number of COTS software tools are now available for Bayesian networks and decision graphs The CD supplied with this seminar contains a set of examples constructed in the Netica® package. A limited functionality version of Netica® is included on the CD. All the examples can be run directly from the CD. To purchase a full functionality version of Netica® visit Examples on the following pages are annotated with the name of the Netica® file or files containing a Bayesian network model for the example

4 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Agenda 1.Introduction 2.Basics of decision theory and graphical models 3.Knowledge engineering and model development 4.Multi-entity Bayesian networks and decision graphs 5.Inference using Bayes Rule 6.Combining expert knowledge and data 7.Conclusion

5 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Models and Representations Engineers, scientists, and policy analysts construct models to represent systems –We use the models to answer questions about the system –Goal: Build “good enough” models »“Good enough” depends on purpose for which model is used »Simplifications and inaccuracies don’t matter if they don’t materially affect results Representations are approximations –Restricted set of variables –Unrealistic simplifications –Untested assumptions Models are constructed from: –Past data on system or related systems –Judgment of subject matter experts –Judgment of experienced model builders

6 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Observations Actions Real World Representation

7 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002

8 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 The Fusion Challenge FusionFusion is the process of incorporating information from different sources into a single “fused” representation Why fusion is difficult: –Vast quantities of sensor information –Real-time processing requirements –Restrictions on weight, communication bandwidth –Need to integrate physical and geometrical models with qualitative knowledge –Noisy, unreliable, ambiguous data –Active attempts at deception –Requirement for robustness to new or little-known threat types and/or behavior patterns Why fusion is important: –Features that are meaningless in isolation are definitive in combination Data, data everywhere, and not the time to think…

9 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 The Systems Engineering Challenge Systems engineeringSystems engineering is the process of transforming a user need into an operational system that meets that need Why systems engineering is difficult: –Increasing complexity of systems –Moving from stovepipe to system of systems –Constant improvement cycle –Need to predict performance of integrated systems made up of legacy, COTS and newly developed subsystems –Need to predict performance of new and untested technologies –Need to do trade studies among large numbers of options with varying degrees of technology maturity and performance uncertainty –Increased reliance on models and simulations with varying degrees of fidelity and resolution –Increasing cost and schedule pressure Why systems engineering is important: –An effective and efficient systems engineering process produces quality systems in a cost-effective manner –Poor systems engineering leads to spectacular performance debacles and cost overruns

10 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Paradigm Shift in Computing Old paradigm: Algorithms running on Turing machines –Deterministic steps transform inputs into outputs –Result is either right or wrong –Semantics based on Boolean logic New paradigm: Economy of software agents executing on a physical symbol system –Agents make decisions (deterministic or stochastic) to achieve objectives –“Program” is replaced by dynamic system in which solution quality improves over time –Semantics based on decision theory / game theory / stochastic processes Hardware realizations of physical symbol systems –Physical systems minimize action –Decision theoretic systems maximize utility / minimize loss –Hardware realization of physical symbol system maps action to utility –Programming languages are replaced by specification / interaction languages –Software designer specifies goals, rewards and information flows –Unified theory spans sub-symbolic to cognitive levels Old paradigm is limiting case of new paradigm

11 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Requirements for New Paradigm Logic Embrace uncertainty Perform plausible reasoning Learn from experience Incorporate observation, historical data, expert knowledge Explore multiple alternatives Replace rote procedure with focus on attaining objectives Trade off multiple objectives

12 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Agenda 1.Introduction 2.Basics of decision theory and graphical models 3.Knowledge engineering and model development 4.Multi-entity Bayesian networks and decision graphs 5.Inference and optimization algorithms 6.Combining expert knowledge and data 7.Conclusion

13 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 The GOOD-D Process Bayesian decision theory is based on the GOOD-D process Good decision makers: –Think carefully about their Goals –Explore multiple Options –Predict the possible Outcomes of the decision and the likelihood of each outcome under each of the options they are considering –Weigh uncertainties and trade off different goals to Decide which option best serves their goals –Do it! Implement a sound and effective plan to carry through on the decision and monitor success Computational models to support GOOD-D process should be grounded in new-paradigm theory

14 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Decision Graph Utility variables represent decision maker’s goals Action variables represent options available to decision maker World state variables represent outcomes of the decision and other relevant aspects of world state Arrows represent influences –Cause and effect relationships between actions and world state variables –Statistical associations among world state variables –How utility for the decision maker depends on state of the world Probabilities: P(Has disease) =.6 Expected utilities: Treat:.6   90= 90 Don't treat:.6  0+.4  100=40 Best action: Treat patient Disease Action Decision graphs are also called influence diagrams Utility Outcome

15 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Decision Theory: Criticisms and Rebuttals Argument: –Bayesians insert subjectivity into analyses Rebuttal: –Every analysis depends on assumptions –Bayesians make subjective assumptions explicit in a way that leaves them open for debate and analysis –Bayesian theory provides a principled, theoretically justifiable way to integrate informed expert judgment with scientific theory and statistical data Argument: –You just can’t put a numerical value on some things, such as human lives Rebuttal: –Every time you get into your car you are implicitly putting a value on your own and others’ lives –Every time the airline allows a person to pass through a metal detector a value is being placed on human lives –Bayesian decision theory provides a scientifically sound way to make these tradeoffs explicit and open to public debate –Bayesian decision theory provides a language for public discourse to improve society’s ability to make sound policy judgments Argument: –If anybody can have any prior they want, how can we justify our model output? Rebuttal: –Bayesian decision theory allows for reasonable people to disagree –When there is disagreement the political process must operate to determine policy –Decision theoretic models can inject sound science into the political debate –With enough data non-dogmatic Bayesians will come to agree

16 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Decision Graph: An Example Maria is visiting a friend when she suddenly begins sneezing.Maria is visiting a friend when she suddenly begins sneezing. "Oh dear, I'm getting a cold," she thinks. “I had better not visit Grandma.”"Oh dear, I'm getting a cold," she thinks. “I had better not visit Grandma.” Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!”Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!” Maria is visiting a friend when she suddenly begins sneezing.Maria is visiting a friend when she suddenly begins sneezing. "Oh dear, I'm getting a cold," she thinks. “I had better not visit Grandma.”"Oh dear, I'm getting a cold," she thinks. “I had better not visit Grandma.” Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!”Then she notices scratches on the furniture. She sighs in relief. "I'm not getting a cold! It's only my cat allergy acting up!” 1 Plausible inference 2 The evidence for cat allergy “explains away” sneezing and cold is no longer needed as an explanation 3 Netica® file Cat_DG.dne

17 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 What is a Decision Graph? Both a knowledge representation and a computational architecture –Represents knowledge about entities and their interactions –Modular elements with defined interconnections –Computation can exploit loosely coupled structure for efficiency Used for inference and/or decision problems –Infer likely values of some variables from other variables –Choose policy that best achieves decision maker’s objectives

18 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Types of Influence Links into chance variables from other chance variables are called relevance links. A relevance arc indicates that the value of one variable is relevant to the probability distribution of the other variable. Links from decision variables into chance variables are called influence links. An influence link means that the decision affects, or influences, the outcome of the chance variable. Links into decision variables are called information links. An information link means that the quantity will be known at the time the decision is made. –Decision variables are ordered in time –In standard influence diagrams, a decision variable and all its information predecessors are (implicit) information predecessors to all future decision variables (no forgetting) Links from chance or decision variables into value variables represent functional links. Value variables may not be parents to decision or chance variables. If there is more than one terminal value variable, the total value is the sum of all terminal value variables. Bayesian network is a decision graph having only world state variables Relevance Link Influence Link Information Link Value Links Notation for value variables varies. Some packages use rounded boxes, others diamonds, others hexagons.

19 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Specifying a BN Simplicity –31 probabilities required to specify general distribution for 5 binary variables –8 probabilities needed to specify this model Scalability –Over probabilities required to specify general distribution for 50 variables with 4 values per variable –About 9000 probabilities required to specify BN for 50 variables, 4 values per variable, 3 parents per variable, no local structure Tractability –General probabilistic inference is exponential in number of variables –Inference in singly-connected BN is linear in number of variables

20 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Directed Graphs Represent Dependence Graphs with nodes and edges provide a powerful tool for visualizing dependencies between variables –Variables are represented by nodes –Direct dependencies between variables are represented by edges connecting nodes –Absence of an edge between 2 variables means no direct dependency Example: Rain or a sprinkler could cause the pavement in front of the house to be wet. This could cause a passerby to slip and fall, and could also ruin my new shoes –Rain and Sprinkler are independent, but are dependent conditional on Pavement –Fall and Shoes are dependent (if a fall is more likely, so are ruined shoes) but are independent given Pavement –Sprinkler and Fall are dependent, but are independent given Pavement

21 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Causation and Explaining Away Alternate causes of an event are often (approximately) independent given some generally understood context. They become dependent when a common effect is observed. (Knowing that one cause is true "explains away" other potential causes) Learning about an effect that could be caused by either of two variables introduces an informational dependence between them Informational dependence is different from causal dependence –We can change an effect by intervening to change the cause »We can make the car start if the battery is dead by putting in a new battery –We cannot change the state of a variable by changing one on which it depends informationally »If the car won’t start and we learn that the solenoid is bad we infer that the battery is probably OK »But we cannot make the battery OK by destroying the solenoid

22 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Exercise: Multi-Sensor Fusion Several blind men are congregating around an elephant, arguing loudly over what they are feeling. Some of them, feeling the legs, shout: “It is a tree!” Others, feeling the tail, shout: “It is a rope!” Still others, feeling the ears, shout: “It is a blanket!” Others, feeling the trunk, shout: “It is a snake!” Use a Bayesian network to show that a fusion system that integrates information from all these sources, each looking at a different aspect of the elephant, can reach a conclusion that none of them is able to reach on the basis of his own information. Netica® file BlindManAndElephant.dne

23 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Exercise: Develop a Bayesian Network The temperature warning light on a piece of equipment is designed to come on when the engine overheats. The light will also flash if the temperature sensor is broken. If the engine has overheated, it may be because a belt is broken or because the temperature in the engine room is too high. A high temperature in the engine room might cause the sensor to malfunction, and may itself be caused by a broken air conditioner. Either a broken belt or a broken air conditioner could be caused by a company history of poor equipment maintenance. An overheated engine might cause product defects. –Draw a Bayesian network representing the uncertainties faced by a technician in diagnosing the reason the light is on. –Use a Bayesian network software package to enter probabilities for this network. Use your judgment to choose reasonable probability distributions. –Use your Bayesian network to answer the following questions »What is the probability that the engine has overheated? »What is the probability that the engine has overheated given that the warning light is on? »What is the probability that the engine has overheated given that the warning light is on, and the air conditioner is broken? »What is the probability that there are product defects? »What is the probability that there are product defects given that the light is on, the air conditioner is OK, and the maintenance history is poor? Netica® file Eqpt_Diagnosis.dne

24 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Example: Develop a Decision Graph If the belt is replaced it is unlikely to break regardless of maintenance practice. Replacing the belt has a cost and so do product defects. –Add a decision variable and value variables to your graph –Use your model to analyze the decision of whether to replace the belt –Use your model to analyze the decision of whether to replace the belt when the warning light comes on Netica® file Eqpt_Diagnosis_DG.dne

25 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Action, Causation and Information Actions have two aspects –Actions can change the world (intervening actions) »Decision to change the belt –Actions can provide information about the state of the world (non- intervening actions) »Decision to check warning light Intervening action –Normal time evolution of variable is “interrupted” –Agent sets the value of variable to desired value –Variables causally downstream are affected by change –Variables causally upstream are unaffected Non-intervening action –Variable is information predecessor to future actions –No other changes Every action has aspects of both –When we take an action our knowledge is updated to include the fact that we took the action (but we may forget actions we took) –Observing the world changes it (we often do not model the impact of actions whose impact is negligible)

26 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Example: Value of Information You are deciding whether to go on a picnic (D=g) or stay home (D=h). Your utility depends on the whether it rains (W=r) or shines (W=s): –u(g,r)=0you go and get wet –u(g,s)=10you go and have fun –u(h,r)=3you stay dry indoors –u(h,s)=4you stay home and it’s nice You can call for a weather forecast before you make your decision The forecast is for rain (r*) or shine (s*) –P(F=r* | W=r)=.9 –P(F=s* | W=s)=.8 What is the optimal decision if the probability of rain is 0.1? What if the probability is 0.8? Is it to your benefit to call for a weather forecast before making your decision? Weathe r Go? Utility Forecas t Call?

27 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 (costless) Low-cost information is worth collecting for some P(rain) High-cost information is never worth collecting VOI as Function of Prior Probability This graph shows the expected utility of three policies as a function of the prior probability of rain: –g: go no matter what –h: stay home no matter what –f: go only if the forecast says no rain When line for f is above other lines information is valuable Difference is called value of information Summary: –Collecting information is useful if it might change your decision –The difference between your expected utility with and without the information is called the expected value of information –Costless information has positive value for 4/13

28 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Variable-Level Independence Independence assumptions are an important tool for simplifying knowledge elicitation and inference –We have seen how Bayesian networks use independence to reduce the number of parameters required to specify a probability distribution –Independence assumptions also reduce the computational complexity of inference Bayesian networks are not expressive enough to encode all independence assumptions that can be exploited to: –simplify knowledge elicitation –reduce computations for inference Additional types of independence which can simplify knowledge elicitation and inference: –Independence of causal influence (ICI) »The mechanism by which one parent variable causes child variable is independent of values of other parent variables and mechanism by which they cause the child –Context-specific independence (CSI) »Variables may be independent of other variables in some contexts but not others –Local expressions »The probability distribution of a variable given its parents can be specified as any parameterized function of the variable’s parents »For many local expressions, the probability distribution of the child depends on the parents only through a sufficient statistic

29 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Example of Noisy-OR Sneezing can be caused by an allergy (A), a cold (C), or dust (D) in the air The Noisy-OR model: –Each cause may or may not "trigger" the effect. »Only causes that are true can trigger the effect »Causes operate "noisily" - if true, they may or may not trigger the effect –There is a "trigger probability" associated with each cause. »Allergy triggers sneezing with probability p A =.6 »Cold triggers sneezing with probability p C =.9 »Dust triggers sneezing with probability p D =.3 –Basic assumptions of the "noisy-OR" model: »Effect occurs if one or more of its causes has triggered it »Whether one cause has triggered the effect is independent of whether another cause has triggered the effect Noisy-OR reduces 8 probability judgments to 3 (or 4 with “leak probability”) If there were 10 parents, it would reduce 1024 probability judgments to 10 (or 11 with “leak probability”)

30 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Context-Specific Independence Activity of military unit depends on the weather (wind and precipitation) and time of day Context-specific independence reduces number of distributions for “Activity” from 8 to 4: Example due to Suzanne Mahoney When there are many parents context-specific independence can provide orders of magnitude savings in elicitation

31 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Agenda 1.Introduction 2.Basics of decision theory and graphical models 3.Knowledge engineering and model development 4.Multi-entity Bayesian networks and decision graphs 5.Inference using Bayes Rule 6.Combining expert knowledge and data 7.Conclusion

32 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Importance of Sound Knowledge Engineering Process Bayesian networks are increasing in popularity Applications are growing more complex A formal, repeatable process for knowledge engineering is becoming more important –Early work on elicitation of probability models (1970’s) focused on eliciting single probabilities or univariate probability distributions –Early work in BNs tended to assume that structure elicitation was relatively straightforward –As models become more complex the KE process must be managed Knowledge elicitation for large Bayesian networks is a problem in systems engineering

33 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Systems Engineering and the System Lifecycle Systems engineering is the technical and managerial process by which a user need is translated into an operational system System life cycle evolves through predictable phases Waterfall Lifecycle Model Spiral Lifecycle Model Retire

34 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Spiral Model of BN Engineering Goal of knowledge engineering –Discovery and construction of appropriate model –Not extraction of pre-existing model Spiral model is necessary for systems in which requirements are discovered as development progresses Spiral KE –Construct series of prototype models –Explore behavior of prototype model on sample problems –Evaluate prototypes and restructure as necessary KE changes both expert and elicitor –Understanding of expert and elicitor deepen as KE proceeds –Improves communication between elicitor and expert

35 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Applying Spiral BN Engineering Begin with a small subproblem –Self-contained –Can be completed in short time –Interesting in its own right –Reasonably representative of global problem Build and test model for subproblem –Look for common structures and processes that will recur –Think about more efficient ways to structure KE –Develop and document conventions (“style guide”) to be followed as models are expanded Iteratively expand to more complex problems

36 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Model Construction Process What are the variables? –World state variables represent possible outcomes –Decision variables represent available options –Utility variables represent attributes of value What are the possible values of world state and decision variables? What is the graph structure? What is the structure of the local distributions for world state variables? What are the local probability distributions for world state variables? What are the local utility functions for the utility variables?

37 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 The Participants Naive view –Put problem experts and modeling experts in a room together and magic will happen” Realistic view –Pure “problem experts” and pure “modeling experts” will talk past each other –Modeling experts must learn about the problem and problem experts must learn what models can do –This process can be time consuming and frustrating –Team will be more productive if both sides expect and tolerate this process Training –The most productive way of training modelers and problem experts is to construct very simple models of stylized domain problems –Goal is understanding and NOT realism or accuracy! –Beware: the training phase can seem pointless and frustrating –It is important to get expert buy-in

38 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Selecting a Subproblem Initial model or expansion of existing model Characteristics –Manageable size –Interesting in its own right –Path to expansion –Risk mitigation How to restrict –Focus or target variables - variables of direct interest to the decision maker »Restrict to subset of variables of interest »Restrict to subset of values –Evidence variables - variables for which information will be available; used to draw inferences about the focus variables »Restrict to subset of evidence sources –Context variables - variables that will be assumed known and will be set to definite values »Restrict to subset of contextual conditions (sensing conditions, background casual conditions; assignment of objects to sensors; number of objects)

39 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Managing Knowledge Acquisition Record rationale for modeling decisions Develop “style guide” to maintain consistency across multiple subproblems –Naming conventions –Variable definitions –Modeling conventions Enforce configuration management –History of model versions –Protocols for making and logging changes to current model –Rationale for changes Develop protocol for testing models –Record of test results traced to model changes and rationale

40 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Model Agility Requirement: rapid adaptation of model to a new situation Support for model agility –Libraries of reusable model fragments –Documentation of stable and changeable aspects of model –Development of data sources for inputs to changeable model components »Protocols for data collection and maintenance »Protocols for importing data into knowledge base –Automated support for propagating impact of changes

41 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Model Evaluation Model walk-through –Present completed model to "fresh" experts and/or modelers –Evaluate all components of model Sensitivity analysis –Measure effect of one variable on another –Compare with expert intuition to evaluate model –Evaluate whether additional modeling is needed Case-based evaluation –Run model on set of test cases »Cases to check local model fragments (component testing) »Cases to test behavior of global model (whole-model testing) –Compare with expert judgment or “ground truth” –Important issue: selection of test cases

42 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Agenda 1.Introduction 2.Basics of decision theory and graphical models 3.Knowledge engineering and model development 4.Multi-entity Bayesian networks and decision graphs 5.Inference using Bayes Rule 6.Combining expert knowledge and data 7.Conclusion

43 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Limits of Standard BN / DG “One size fits all” model –All problem instances involve: »Same set of variables »Same states for variables »Same relationships between variables –Only “evidence” (instantiated variables) varies from instance to instance –All potentially relevant explanations are explicitly represented In complex, open-world problems: –Number of actors and relationships to each other not fixed in advance –Attribution of evidence to actors may not be known in advance –Situation evolves in time –Need to represent only the most important explanations

44 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Maria’s Continuing Saga… Variation 1: –Tran is sneezing and saw scratches –Tran was recently exposed to a cold and probably is not allergy prone Variation 2: –Tran saw scratches –Maria did not see scratches –Tran is in room with Maria Variation 3: –Tran and Maria both are sneezing, are allergy prone, and saw scratches –Tran and Maria are a continent apart

45 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Variation 1 Add background variables to specialize model to different individuals Still a “one size fits all” model Netica® file Cat_SSN_Variation1.dne

46 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Variation 2 Decision graph has replicated sub-parts Different kinds of entities (cats, people) SeesScratches(Maria) True False AllergicReaction(Tran) True False ColdStatus(Tran) True False Sneezing(Tran) True False AllergyProne(Maria) True False AllergicReaction(Maria) True False Sneezing(Maria) True False ExposedToCold(Maria) True False ColdStatus(Maria) True False Near(Tran,Cat1) True False Near(Maria,Cat1) True False Sees_Scratches(Tran) True False AllergyProne(Tran) True False ExposedToCold(Tran) True False Loc(Tran)Loc(Maria)Loc(Cat1) Near(Maria,Tran) True False Health(Grandmother1) Pleasure(Maria,Grandmother1) Visit(Maria,Grandmother1) Go StayHome Netica® file Cat_SSN_Variation2.dne

47 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Variation 3 Done Wrong Variation 2 model gets wrong answer if Maria and Tran are not near each other and both are near cats! We need to be able to hypothesize additional cats if and when necessary SeesScratches(Maria) True False AllergicReaction(Tran) True False ColdStatus(Tran) True False Sneezing(Tran) True False AllergyProne(Maria) True False AllergicReaction(Maria) True False Sneezing(Maria) True False ExposedToCold(Maria) True False ColdStatus(Maria) True False Near(Tran,Cat1) True False Near(Maria,Cat1) True False Sees_Scratches(Tran) True False AllergyProne(Tran) True False ExposedToCold(Tran) True False Loc(Tran)Loc(Maria)Loc(Cat1) Near(Maria,Tran) True False Health(Grandmother1) Pleasure(Maria,Grandmother1) Visit(Maria,Grandmother1) Go StayHome Netica® file Cat_SSN_Variation2.dne

48 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Variation 3 Done Right (…but what a mess!) This model gets the “right answer” on all the variations Netica® file Cat_SSN_Variation3.dne

49 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 The Solution Cats & Allergies Fragment SpatialFragment HypothesisManagementFragment Colds&TimeFragment ValueFragment SneezingFragment Specify model in pieces and let the computer compose them

50 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Multi-Entity Bayesian Networks and Decision Graphs Represent knowledge as model fragments –Implicitly represents complete and consistent model of domain and anticipated situations –No a priori bound on #entities, #relevant relationships, #observations Compose fragments dynamically into situation specific network (SSN) –A situation is a snapshot of the world at an instant of time –A situation-specific network is an ordinary, finite Bayesian network or decision graph constructed from the MEDG knowledge base using network construction operators Use SSN to compute response to query –General purpose algorithm –Approximates the “correct answer” encoded by the knowledge base –Models with special structure can be solved with special-case algorithms Use expert-guided Bayesian learning to update knowledge patterns over time

51 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Model Construction Simpler models give same results as more complex model on problems for which they are adequate We want to construct “good enough” model for our situation Model constructor builds situation-specific DG from knowledge base implicitly encoding infinite-dimensional DG

52 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Functional Architecture for Model Construction System BN/DG Fragment KB Model Workspace Streaming Evidence Retrieve model fragments Match variables Attach evidence to variables Alert Messages Combine fragments into situation-specific model Update inferences and decisions Suggestors

53 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 MEDGs and Object-Oriented Representation Entities are represented as objects Probability part of MEDG expresses uncertainty about entities –Attribute uncertainty –Existence uncertainty (false alarm) –Type and subtype uncertainty (discrimination within a type hierarchy) –Reference uncertainty (association) Value and action part of MEDG represents objectives and plans of software agent Mammal PhysicalObject Human Cat Inheritance PhysicalObject PhysicalObject Near Association LocationTime PhysicalObject Composition

54 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 MEBN / MEDG Value Added Effort Problem Complexity # of entities, # interactions, spatio-temporal variables,... Modeling with standard Bayesian Networks Modeling with richer representation for managing and reasoning w/ Uncertainty Inference Collaborative Environment (ICE) MEBN software developed by Information Extraction & Transport, Inc. BMD/ICE development sponsored by MDA SPARTA engineers are using ICE to develop fusion architecture for missile defense

55 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Agenda 1.Introduction 2.Basics of decision theory and graphical models 3.Knowledge engineering and model development 4.Multi-entity Bayesian networks and decision graphs 5.Inference using Bayes Rule 6.Combining expert knowledge and data 7.Conclusion

56 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Bayesian inference is a theory of rational belief dynamics Uncertainty about state of world is represented by a probability distribution over states –Probability is a rational agent’s degree of belief about uncertain states of the world Beliefs are updated over time by conditioning on new information about the world The rule for updating beliefs with new information is called Bayes rule: Evidence increases odds of hypothesis H 2 relative to hypothesis H 1 if the evidence is more likely under H 2 than under H 1 –Posterior probability can increase even with unlikely evidence if it is more unlikely under alternate hypothesis –Posterior probability can fail to increase even with likely evidence if it is more likely under alternate hypothesis Bayesian theory justifies the scientific process –The best way to confirm a hypothesis is to enumerate many plausible alternatives and find evidence to disconfirm them Likelihood ratio Prior odds ratio Posterior odds ratio Bayesian Inference

57 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Graphical View of Bayes Rule Depravia has a population of 40 million people –5 million in Northeastern province and 35 million in the rest of Depravia The Zappist fundamentalists make up 60% of the population of the Northeastern province and 5% of the population of the rest of Depravia What is the probability that Frangolina is from the Northern province of Depravia if: –All we know about her is that she lives somewhere in Depravia? –We learn that she is a fundamentalist Zappist? * ~ 1 million Zappists * ~ 1 million non-Zappists * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Netica® file ZappistProvince.dne Province Northeastern Other Religion Zappist Other Province Northeastern Other Religion Zappist Other 100 0

58 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Rare Events and the False Positive Index Diagnostic tests can be evaluated by sensitivity and specificity –Sensitivity is probability condition will be detected if present –Specificity is probability condition will not be detected if not present –We can increase sensitivity by adjusting the threshold for declaring a positive, but at the cost of decreasing specificity The false positive index is the number of false positives for every true positive The chart plots false positive index against base rate for a test with 3 different thresholds The best way to increase both sensitivity and specificity is to integrate information from multiple sources Bayesian networks do this in a principled way Increasing the accuracy of single-source sensors may not help if single source cannot observe all relevant features

59 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Soft Evidence Ten year old Leslie comes downstairs complaining of a headache and chills. You put your hand on her forehead and decide that she probably has a fever. What's the probability she's sick enough to stay home from school? We draw a Bayesian network with variables S (sick?), H (headache?), F (fever?), C (chills?). How do we process the evidence that her forehead "feels hot?" We could add a “virtual” child of F called T (touch forehead). We would assign possible values (e.g., very hot, hot, warm, cool) for T and assess P(T | F) for each value of F and T Why should we have to assess P(F|T) for values of T that did not occur? We can achieve the same effect using “soft evidence” for F –Soft evidence represents the relative likelihood of “Leslie’s feel” if she has a fever and if she doesn’t –An 8 to 1 ratio could be entered as (0.8,0.1), (0.4,0.05) or any pair of values with an 8 to 1 ratio Netica® file Leslie.dne

60 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Inference by Local Distributed Computation Distributed inference algorithms perform fast computation –Algorithm computes probability distribution of non-evidence variables given evidence variables using Bayes rule –Tractable exact inference on probability distributions of complex phenomena –Approximate inference makes even the most difficult problems feasible Graph separation provides the basis for defining communication architecture for fusion systems –Compile Bayesian network into computational representation for inference –Determine information requirements at variables and required flows along links –Analyze tradeoffs in accuracy and computation/communication CGH DC AB BEC ECG EGF A B D C E H F G

61 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Inference Algorithms Exact, graph theory based –Pearl's tree algorithm and its modifications –Junction tree algorithm –Influence diagram reduction Exact, factorization based –Symbolic probabilistic inference –Bucket elimination Approximate –Monte Carlo simulation –Variational methods –Various special case approaches All these algorithms solve the canonical inference problem: find the posterior probability distribution for some variable(s) given direct or virtual evidence about other variable(s)

62 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Agenda 1.Introduction 2.Basics of decision theory and graphical models 3.Knowledge engineering and model development 4.Multi-entity Bayesian networks and decision graphs 5.Inference using Bayes Rule 6.Combining expert knowledge and data 7.Conclusion

63 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 The Learning Problem Why learning? –Knowledge elicitation is difficult –Experts can be biased –There may be no experts for some problems –In many applications data are plentiful Learning tasks –Learn a Bayesian network from observations alone –Combine expert knowledge and data Structure of a learning algorithm –Method for searching over structures –Method for evaluating “goodness” of structures »Bayesian learner compares Structure1 and Structure2 by ratio of posterior probabilities P(Data | Structure1) / P(Data | Structure2) –Method for estimating parameters given structure »Bayesian learner computes posterior distribution of parameters given structure –Choice of output »Single most probable structure and estimated probability table? »Sample of structures from posterior distribution of structures, expected value of probability table, standard deviations of probabilities? »Other possibilities

64 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Raw Material: The Observations Easiest case: –Random sample of cases from the network to be learned –Each case contains an observed value for all variables in the network Complexities: –Missing observations: Some variables are not observed in some cases –Hidden variables: Some variables are not observed in any cases –Non-random sampling: Sampled cases are not representative of the population for which a Bayesian network is being learned

65 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Learning a Single Probability Objective: From a sample of student responses to a problem, learn the probability that a similar student will answer correctly If we know nothing about the probability a priori then we may consider all probabilities equally likely (uniform distribution) Suppose 7 of 10 students answer correctly The posterior distribution has a Beta distribution with parameters 8 and 4 The posterior expected value is 8/(8+4) = 0.67 PercentCorrect 0 to to to to to to to to to to ± 0.29 Correct(2) True False 50.0 Correct(1) True False 50.0 Correct(9) True False 50.0 Correct(10) True False 50.0 Correct(8) True False 50.0 Correct(7) True False 50.0 Correct(3) True False 50.0 Correct(4) True False 50.0 Correct(5) True False 50.0 Correct(6) True False 50.0 Prior distribution: All percentages are equally likely (uniform distribution) PercentCorrect 0 to to to to to to to to to to ± 0.13 Correct(2) True False Correct(1) True False Correct(9) True False Correct(10) True False Correct(8) True False Correct(7) True False Correct(3) True False Correct(4) True False Correct(5) True False Correct(6) True False Posterior distribution: Beta(8,4) distribution with expected value 67% Netica® file Learning.dne The possible values for a probability are any number between 0 and 1. Netica® approximates the posterior distribution with finitely many “bins”

66 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Virtual Counts and Expert/Data Combination Suppose 6 out of 10 students in a second sample answered correctly Our new posterior distribution is also a Beta distribution with parameters 14 and 8 The rule for updating: –Prior Beta distribution has “virtual count” of 8 for True and 4 for False –Data contains 6 true values and 4 false values –Add the virtual counts to the actual counts to obtain the posterior virtual counts of 14 and 8 Uniform distribution is a Beta distribution with virtual count of 1 for True and 1 for False –Posterior probability for True is (1 + #True)/(2 + #Sample) We can assess virtual counts from the expert and combine them with data –Larger virtual counts mean less uncertainty –Expected value for a state is its virtual count divided by total virtual count The same idea works when there are more than two states if we use the Dirichlet distribution instead of the Beta distribution Why not just use observed frequencies to estimate probabilities?

67 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Triplot A triplot is a convenient tool for visualizing the combination of prior and data to form a posterior distribution. It plots the prior, posterior and “normalized likelihood” on the same axes. Triplot for vague but on target expert –Expert distribution is spread out but likely to be near 70% True –5 out of 25 observations are True –Posterior expected value is 23/30 = 77% True “Blind” application of updating formula can be a bad idea if expert is wrong Adding “pinch of probability” on hypothesis that model is wrong adds robustness Serious conflict should trigger an alert!

68 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Overconfident but approximately correct expert Overconfident and wrong expert With 1% “pinch” on uniform prior The “Pinch of Probability” With 1% “pinch” on uniform prior

69 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Learning Structure We can represent uncertainty about the structure of a Bayesian network as a probability distribution over structures –Assign prior probability to each structure and to local distributions conditional on structure –Apply Bayes rule to obtain posterior distribution for structure and parameters given data “Natural Occam’s Razor” –When sample sizes are relatively small Bayesian learning tends to favor simpler models (models with fewer arcs and simpler local distributions) –As observations accumulate Bayesian learning will adjust the number of parameters as needed to account for the data –In general Bayesian learning tends to converge to the simplest model that is consistent with the observations

70 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Agenda 1.Introduction 2.Basics of decision theory and graphical models 3.Knowledge engineering and model development 4.Multi-entity Bayesian networks and decision graphs 5.Inference using Bayes Rule 6.Combining expert knowledge and data 7.Conclusion

71 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Application Areas BNs and DGs are increasingly being applied to a broad array of problems –Automated and mixed initiative multi-source fusion –Automated and mixed initiative data mining –Models and simulations for systems engineering trade studies and system design –Agent models for agent-based computing –Cognitive modeling –Corporate and government strategic planning –Biometrics –Cyber-security –Decision support for many applications –Human genome project How could BNs and DGs be applied in your area of work?

72 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Summary BNs and DGs move from hand-crafted special purpose models to genuine open-world reasoning capability –MEDGs are logic for new-paradigm computing General-purpose language for formulating and specifying scientific, engineering and policy models –Specify knowledge via modular components with well-defined interconnections –Uncertainty and observability intrinsic elements of the representation –Combine physical, logical, and subjective knowledge in a unified and logically defensible manner –Represent both causal and correlational knowledge Modular representation for composing complex probability models from manageable sub-units –First-principles domain model –Decision theory provides approach to trading off expressive power against complexity in »Knowledge engineering »Model construction »Inference »Learning –Unified knowledge modeling methodology & process spans sub-symbolic through cognitive and social/organizational levels Application experience to date shows strong promise There are many open research issues

73 George Mason University BDT Workshop GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 References Bayesian networks and decision graphs –Charniak, E. Bayesian Networks without Tears, AI Magazine, –Jensen, F. Bayesian Networks and Decision Graphs Springer-Verlag, –Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, The first book popularizing Bayesian networks. –Clemen, R. Making Hard Decisions Knowledge engineering –Special issue of IEEE Transactions on Knowledge and Data Engineering: Probability Models: Where do the Numbers Come From?, Constructing BNs and DGs from modular and repeatable components –Glesner, S. and D. Koller (1995) Constructing Flexible Dynamic Belief Networks from First- Order Probabilistic Knowledge Bases, in ECSQARU ’95, pp –Laskey, K.B., Mahoney, S.M. and Wright, E. (2001) Hypothesis Management in Situation- Specific BN Construction. In Koller, D. and Breese, J., Uncertainty in Artificial Intelligence: Proceedings of the Sixteenth Conference, San Francisco, CA: Morgan Kaufmann. Combining expert knowledge and data –D. Heckerman. A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, March, –Laskey, K.B. and Mahoney, S.M. Knowledge and Data Fusion in Probabilistic Networks submitted to Journal of Machine Learning Research special issue on knowledge and data fusion Web site for my GMU course –contains lecture notes, additional references, exercises, useful links –http://ite.gmu.edu/~klaskey/CompProbhttp://ite.gmu.edu/~klaskey/CompProb


Download ppt "George Mason University BDT Workshop - 1 - GMU Department of Systems Engineering and Operations Research ©Kathryn Blackmond LaskeyJune 2002 Bayesian Decision."

Similar presentations


Ads by Google