Knowledge Engineering for Bayesian Networks

Knowledge Engineering for Bayesian Networks
Ann Nicholson School of Computer Science and Software Engineering Monash University

Overview Representing uncertainty Introduction to Bayesian Networks
Syntax, semantics, examples The knowledge engineering process Open research questions

Sources of Uncertainty
Ignorance Inexact observations Non-determinism AI representations Probability theory Dempster-Shafer Fuzzy logic Ignorance of domain, what the physical interactions of the world are (ie so can’t model with simple IF A THEN B) Inexact observations - need to gather evidence about what is happening in the world - make observations, using some sort of sensor - these readings may not be exact or accurate Non-determinism - the outcome of interactions or the agents actions may not be deterministic (at least at the level of modelling being done). IF do(A) then B or C. I’m going to avoid the religious wars, and today talk about probabilistic AI representations...

Probability theory for representing uncertainty
Assigns a numerical degree of belief between 0 and 1 to facts e.g. “it will rain today” is T/F. P(“it will rain today”) = 0.2 prior probability (unconditional) Posterior probability (conditional) P(“it wil rain today” | “rain is forecast”) = 0.8 Bayes’ Rule: P(H|E) = P(E|H) x P(H) P(E) 0.8 means 80% change of being true, not 80% true (fuzzy logic) Bayes’ rule means that given a previous belief in a hypothesis being true, and having observed some evidence E, and if we know the likelihood of evidence given the hypothesis, we can update our belief in the hypothesis given E (and don’t need to know P(E), can normalise out) So people have known about probability theory for a long time - what’s new about it being used in AI? People though that you had to specify the full joint distribution - the probability of each combination of all the variables you were interested in - too many numbers needed. However in the late 80s, efficient methods for representing and reasoning with more compact representations of the full distributions (based on work in graphical modelling and stats that had been happening for most of 80s) - less numbers need to be provided, computation of new beliefs could be done fast…

Bayesian networks Directed acyclic graphs Nodes: random variables,
R: “it is raining”, discrete values T/F T: temperature, cts or discrete variable C: colour, discrete values {red,blue,green} Arcs indicate dependencies (can have causal interpretation) Absence of arcs: independency assumption

Bayesian networks Conditional Probability Distribution (CPD) X Y Q
Associated with each variable probability of each state given parent states “Jane has the flu” X Flu P(Flu=T) = 0.05 Models causal relationship Compact representation of joint p.d. A matrix attached to each node P(Flu,Te,Th) = P(Th|Te)P(Te|Flu)P(Flu) “Jane has a high temp” P(Te=High|Flu=T) = 0.4 P(Te=High|Flu=F) = 0.01 Y Te Models possible sensor error “Thermometer temp reading” Q Th P(Th=High|Te=H) = 0.95 P(Th=High|Te=L) = 0.1

BN inference Evidence: observation of specific state
Task: compute the posterior probabilities for query node(s) given evidence. Th Y Flu Te Diagnostic inference Th Flu Y Te Causal inference Intercausal inference Te Flu TB Flu Intercausal inference Th Flu Te

BN software Several commerical packages Examples
Netica, Hugin, Analytica (all with demo versions) Free software: Smile, Genie, JavaBayes, … [Add Almond and Murphy BN info sites] Examples

Decision networks Extension to basic BN for decision making
Decision nodes Utility nodes EU(Action) =  p(o|Action,E) U(o) o choose action with highest expect utility Example

Elicitation from experts
Variables important variables? values/states? Structure causal relationships? dependencies/independencies? Parameters (probabilities) quantify relationships and interactions? Preferences (utilities)

Knowledge Engineering Process
These stages are done iteratively Stops when further expert input is no longer cost effective Process is difficult and time consuming As yet, not well integrated with methods and tools developed by the Intelligent Decision Support community. Mention that for first 5+ years, most research into doing efficient updating, examples were small and handcrafted. More recently there has been a focus on the KE process - see IEEE Trans special issue

Knowledge discovery There is much interest in automated methods for learning BNS from data parameters, structure (causal discovery) Computationally complex problem, so current methods have practical limitations e.g. limit number of states, require variable ordering constraints, do not specify all arc directions Evaluation methods Parameters fairly straightforward - pretty much just need values for all variables (lots of them), and then do frequency counts. Count how many times a node has a particular value given its parents having other values. Causal discovery more difficult - basically a search through the space of possible structures - exponential! Also (for technical reasons) - there is not just one unique network that represents the same joint pd - there may be many. So which would you choose? Causal interpretation makes a difference, corresponds to most compact model. Also roblems - missing data, hidden variables evaluation methods - take an existing network, generate data consistent with it (10,000 cases, that reflect the probabilities in the BN. Learn BN from data and compare to original network. This is alright for theoretical results or comparing algorithms, but

The knowledge engineering process
1. Building the BN variables, structure, parameters, preferences combination of expert elicitation and knowledge discovery 2. Validation/Evaluation case-based, sensitivity analysis, accuracy testing 3. Field Testing alpha/beta testing, acceptance testing 4. Industrial Use collection of statistics 5. Refinement Updating procedures, regression testing Proposed (with Kevin Korb)

Case Study: Seabreeze prediction
2000 Honours project, joint with Bureau of Meteorology (PAKDD’2001 paper, TR) BN network built based on existing simple expert rule Several years data available for Sydney seabreezes CaMML and Tetrad-II programs used to learn BNs from data Comparative analysis showed automated methods gave improved predictions.

Case Study: Intelligent tutoring
Adaptive Bayesian Network Inputs Student Generic BN model of student Decimal comparison test (optional) Item Answers Answer Diagnose misconception Predict outcomes Identify most useful information Information about student e.g. age (optional) Computer Games Hidden number Answer Classroom diagnostic test results (optional) Feedback Answer Flying photographer Select next item type Decide to present help Decide change to new game Identify when expertise gained System Controller Module Item type Item Decimaliens New game Current project with Liz Sonenberg, Kaye Stacey (Education Faculty) Development of an ITS for decimal misconceptions Current implementation just uses elicited network Recently have run some automated methods - classify misconceptions - learn parameters - learn structure promising results - paper just submitted to conference on this Challenge will be to apply methods to data that hasn’t been gathered with expert knowledge (e.g. algebra) Sequencing tactics Number between Help Help …. Report on student Classroom Teaching Activities Teacher

Case Study: Bayesian poker

Consulting experiences
In 1999/2000, Kevin Korb and myself Clients: NAB, North Ltd Process approached by technical person interested in the technology gave workshops on BN technology brainstorming for BN elicitation (iterative) technical person satisfied with preliminary results BN technology not “sold” to managers

Open Research Questions
Tools needed to support expert elicitation reduce reliance on BN expert example - visualisation of explanatory methods Combining expert elicitation and automated methods Evaluation measures and methods Industry adoption of BN technology (These research questions are related to the KE process. There are lots of open research questions about efficient approximate inference algorithms, learning BNs, etc.) Tools - at the moment, BN software assumes a fair knowledge of BNs. There is no high level wrapper for guiding a novice through the BN construction process. With Kevin Korb at Monash, we are looking at developing aspects of such a tool (Monash small grant). Explanation facilities such as “sensitivity analysis” measures are provided but with no visualisation -supposed to show which nodes most influence the posterior probabilities for a particular query node (say) but what does this mean? (Show Netica eg). I am currently co-supervising a CS Masters student, Tali Boneh with Liz Sonenberg, looking at this. Combining expert + automated - again, tools and software need to be integrated. At the moment, the good learning methods are not part of BN software packages (currently in development - Netica says will have some soon). But how should tasks be combined. BN learning is a very computationally complex - expert knowledge can be used to constrain the search. Additionally, output from automated methods can be put to experts, who can see which best match their domain understanding, etc. Once BN technology

Visit to UniMelb March-June (away some of April/May)
Work on BN textbook (joint with Kevin Korb) Continue ongoing research projects Talk with DIS academics with any common interests.

Knowledge Engineering for Bayesian Networks

Similar presentations

Presentation on theme: "Knowledge Engineering for Bayesian Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Knowledge Engineering for Bayesian Networks

Similar presentations

Presentation on theme: "Knowledge Engineering for Bayesian Networks"— Presentation transcript:

Similar presentations

About project

Feedback