Bayesian Networks for Risk Assessment

Slides:



Advertisements
Similar presentations
1 WHY MAKING BAYESIAN NETWORKS BAYESIAN MAKES SENSE. Dawn E. Holmes Department of Statistics and Applied Probability University of California, Santa Barbara.
Advertisements

New Developments in Bayesian Network Software (AgenaRisk)
Autonomic Scaling of Cloud Computing Resources
ICFIS, Leiden 21 August 2014 Norman Fenton Queen Mary University of London and Agena Ltd Limitations and opportunities of the likelihood.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
CSCE 582 Computation of the Most Probable Explanation in Bayesian Networks using Bucket Elimination -Hareesh Lingareddy University of South Carolina.
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
Author: David Heckerman Presented By: Yan Zhang Jeremy Gould –
1 Knowledge Engineering for Bayesian Networks. 2 Probability theory for representing uncertainty l Assigns a numerical degree of belief between 0 and.
Using Bayesian Networks to Model Accident Causation in the UK Railway Industry William Marsh Risk Assessment and Decision Analysis Group Department of.
Introduction of Probabilistic Reasoning and Bayesian Networks
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
CS 589 Information Risk Management 30 January 2007.
1 © 1998 HRL Laboratories, LLC. All Rights Reserved Construction of Bayesian Networks for Diagnostics K. Wojtek Przytula: HRL Laboratories & Don Thompson:
CS 589 Information Risk Management 23 January 2007.
Project Risk Management
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
AI - Week 24 Uncertain Reasoning (quick mention) then REVISION Lee McCluskey, room 2/07
CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Probability, Bayes’ Theorem and the Monty Hall Problem
Copyright © 2003 Pearson Education, Inc. Slide 5-1 Chapter 5 Risk and Return.
Graphical Causal Models: Determining Causes from Observations William Marsh Risk Assessment and Decision Analysis (RADAR) Computer Science.
Author: David Heckerman Presented By: Yan Zhang Jeremy Gould – 2013 Chip Galusha
Chapter 11: Project Risk Management
WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.
HIT241 - RISK MANAGEMENT Introduction
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
VTT-STUK assessment method for safety evaluation of safety-critical computer based systems - application in BE-SECBS project.
Copyright © 2009 PMI RiskSIGNovember 5-6, 2009 RiskSIG - Advancing the State of the Art A collaboration of the PMI, Rome Italy Chapter and the RiskSIG.
1 Department of Electrical and Computer Engineering University of Virginia Software Quality & Safety Assessment Using Bayesian Belief Networks Joanne Bechta.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Renaissance Risk Changing the odds in your favour Risk forecasting & examples.
Testing Models on Simulated Data Presented at the Casualty Loss Reserve Seminar September 19, 2008 Glenn Meyers, FCAS, PhD ISO Innovative Analytics.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Lecture on Bayesian Belief Networks (Basics) Patrycja Gradowska Open Risk Assessment Workshop Kuopio, 2009.
1 Reasoning Under Uncertainty Artificial Intelligence Chapter 9.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
CSE PR 1 Reasoning - Rule-based and Probabilistic Representing relations with predicate logic Limitations of predicate logic Representing relations.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
why smart data is better than big data Queen Mary University of London
4 Proposed Research Projects SmartHome – Encouraging patients with mild cognitive disabilities to use digital memory notebook for activities of daily living.
Bayesian Statistics and Decision Analysis
SOFTWARE METRICS Software Metrics :Roadmap Norman E Fenton and Martin Neil Presented by Santhosh Kumar Grandai.
Slide 1 UCL JDI Centre for the Forensic Sciences 21 March 2012 Norman Fenton Queen Mary University of London and Agena Ltd Bayes and.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Department of Surgery and Cancer Imperial College London 20 May 2014 Norman Fenton Queen Mary University of London and Agena Ltd Improved Medical Risk.
Using Bayesian Nets to Predict Software Defects in Arbitrary Software Lifecycles Martin Neil Agena Ltd London, UK Web:
Slide 1 SPIN 23 February 2006 Norman Fenton Agena Ltd and Queen Mary University of London Improved Software Defect Prediction.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
- 1 - Outline Introduction to the Bayesian theory –Bayesian Probability –Bayes’ Rule –Bayesian Inference –Historical Note Coin trials example Bayes rule.
Bayes for Beginners Anne-Catherine Huys M. Berk Mirza Methods for Dummies 20 th January 2016.
CSE (c) S. Tanimoto, 2007 Bayes Nets 1 Bayes Networks Outline: Why Bayes Nets? Review of Bayes’ Rule Combining independent items of evidence General.
Oliver Schulte Machine Learning 726
QMT 3033 ECONOMETRICS QMT 3033 ECONOMETRIC.
Lecture on Bayesian Belief Networks (Basics)
Pekka Laitila, Kai Virtanen
Today.
Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule
Course: Autonomous Machine Learning
Uncertainty in AI.
A. Mancusoa,b, M. Compareb, A. Saloa, E. Ziob,c
Bayesian Statistics and Belief Networks
Presentation transcript:

Bayesian Networks for Risk Assessment Government Actuary's Department 18 November 2014 Norman Fenton Queen Mary University of London and Agena Ltd In introducing myself I will mention the wide range of areas where we have applies BNs, but make it clear that I will restrict today’s detailed examples to financial type applications.

Outline Overview of Bayes and Bayesian networks Why Bayesian networks are needed for risk assessment Examples and real applications in financial risk Challenges and the future

Our book www.BayesianRisk.com

Overview of Bayes and Bayesian Networks

A classic risk assessment problem A particular disease has a 1 in 1000 rate of occurrence A screening test for the disease is 100% accurate for those with the disease; 95% accurate for those without What is the probability a person has the disease if they test positive?

Bayes Theorem Have a prior P(H) (“person has disease”) H (hypothesis) Have a prior P(H) (“person has disease”) E (evidence) Now get some evidence E (“test result positive”) We know P(E|H) So far I have expected you to believe the results of the computations. But what lies behind BNs. <CLICK> Of course it is Bayes theorem that provides the means of formally updating our belief in a hypothesis in the light of new evidence. <CLICK> we start with some hypothesis and a prior probability for it. To illustrate I will use the example of a person having or not having a disease. So for simpliciity this variable has two values T or F. Suppose P(H) is 1/1000 <CLICK> We observe a piece of evidence. Could be diagnostic test result. Again for simplicity assume this outcome is T or F. <CLICK> We know the probability of the evidence given the hypothesis – this is the test accuracy. Suppose eg the test is always pos if a person has the disease P(E|H) = 1 and P(E| not H) is 0.05 but what we really want to know is the revised probability of the hypothesis given the evidence <CLICK> Bayes theorem gives us the necessary formula for this. <CLICK> so it exposes the fallacy in this and similar cases giving us the correct answer. I will alert you here to a problem. Easy for statisticians and for mathematically literate people in this simple case, But for MOST people – and this includes from my personal experience highly intelligent barristers, judges and surgeons – this is completely hopeless. And it is no good us arguing that it is not. But we want the posterior P(H|E) P(H|E) = P(E|H)*P(H) P(E) P(E|H)*P(H) + P(E|not H)*P(not H) = 1*0.001 1*0.001 + 0.05*0.999 P(H|E)  = 0.001 0.5005 0.02

A Classic BN Key thing is that as soon as we have multiple related variable the Bayes calculations become a problem

Intractable even for small BNs Bayesian Propagation Applying Bayes theorem to update all probabilities when new evidence is entered Intractable even for small BNs Breakthrough in late 1980s - fast algorithms Tools implement efficient propagation So we need a method for doing the calculations across a complex model. This is called Bayesian propagation Now Bayes Theorem represents the reasoning needed when there is a single link between two uncertain nodes. In a BBN you have multiple links and when you enter pieces of evidence you update all of the probabilities in the BBN by applying Bayes theorem recursively. This is called PROPAGATION. This is what you saw happen when I entered observations into the example BBN. Although the underlying theory of Bayes has been around for a long time it turns out that propagation is computatonally intractable even for small BBNs, so until very recently nobody could build and execute realistic BBN models even though everybody knew they were a great formalism for handling uncertainty. Fortunately in the late 1980s researchers devleoped algorithms which meant that many classes of very large BBNs could be propogated efficiently. Software tools that implement these algorithms such as the tool I just used have since become available commercially and this has led to an explosion of interest.

A Classic BN: Marginals

Dyspnoea observed

Also non-smoker

Positive x-ray

..but recent visit to Asia

The power of BNs Explicitly model causal factors Reason from effect to cause and vice versa ‘Explaining away’ Overturn previous beliefs Make predictions with incomplete data Combine diverse types of evidence Visible auditable reasoning Can deal with high impact low probability events (we do not require massive datasets) <CLICK> Explicitly model causal factors <CLICK> Unlike traditional statistical models we can Reason from effect to cause and vice versa <CLICK> ‘Explaining away’ In meteor example when we removed the option of underground cities the observation of no loss of life was explained away – it almost certainly had to be due to blowing up the meteor. <CLICK> Overturn previous beliefs in the light of new evidence – we have seen how a single piece of evidence can have a dramatic impact <CLICK> Make predictions with incomplete data: In fact the model starts to make predictions with NO data at all. <CLICK> Combine diverse types of evidence including both subjective beliefs and objective data <CLICK> Arrive at decisions based on visible auditable reasoning

Why causal Bayesian networks are needed for risk assessment

Problems with regression driven ‘risk assessment’ Irrational for risk assessment Rational for risk assessment

‘Standard’ definition of risk “An event that can have negative consequences” Measured (or even defined by): Similarly heat maps Used for risk registers – but produce counterintuitive results – the more you think carefully about risk the more risky the project!

..but this does not tell us tell us what we need to know Armageddon risk: Large meteor strikes the Earth How do we get the probability number? Meteor on direct course to earth so should it be 1? Clearly it needs to be conditioned on other events and actions but where are these in the ‘model’? How do we get the impact number? Clearly massive – or is it? Risk score meaningless: suggests no point in Bruce Willis doing anything at all. Does not tell us what we need to know: how do we best avoid massive loss of life … is it worth trying… The ‘standard approach’ makes no sense at all

Risk using causal analysis A risk is an event that can be characterised by a causal chain involving (at least): The event itself At least one consequence event that characterises the impact One or more trigger (i.e. initiating) events One or more control events which may stop the trigger event from causing the risk event One or more mitigating events which help avoid the consequence event (for risk) A risk (and, similarly, an opportunity) is an event that can be characterised by a causal chain involving (at least): the event itself at least one consequence event that characterises the impact (so this will be something negative for a risk event and positive for an opportunity event) one or more trigger (i.e. initiating) events one or more control events which may stop the trigger event from causing the risk event (for risk) or impediment events (for opportunity) one or more mitigating events which help avoid the consequence event (for risk) or impediment events (for opportunity

Bayesian Net with causal view of risk Meteor on collision course with Earth Trigger Blow up Meteor Control Build Underground cities Mitigant Meteor strikes Earth Risk event A risk is therefore characterised by a set of uncertain events. Each of these events has a set of outcomes. For simplicity we assume that these events have two outcomes -- true and false (in practice we can extend the outcomes to incorporate more states). So, for example, “Loss of life” here means loss of at least 80% of the world population. The ‘uncertainty’ associated with a risk is not a separate notion (as assumed in the classic approach). Every event (and hence every object associated with risk) has uncertainty that is characterised by the event’s probability distribution. The sensible risk measures that we are proposing are simply the probabilities you get from running a risk map. Of course, before you can run a risk map you still have to provide some probability values. But, in contrast to the classic approach, the probability values you need to supply are relatively simple and they make sense. And you never have to define vague numbers for ‘impact’. Loss of Life Consequence

Examples and real applications in financial risk

Causal Risk Register Note that ‘common causes’ are easily modelled

Assumes capital sum $100m and a 10-month loan Simple stress test interest payment example Assumes capital sum $100m and a 10-month loan Expected value of resulting payment is $12m with 95% percentile at $26m Regulator stress test: “at least 4% interest rate”

Simple stress test interest payment example Expected value of resulting payment in stress testing scenario is $59m with 95% percentile at $83m This model can be built in a couple of minutes with AgenaRisk

Stress testing with causal dependency

Stress testing with causal dependency

Op Risk Loss Event Model

Operational Risk VAR Models Aggregate scenario outcome Contributing outcomes Scenario dynamics

Stress and Scenario Modelling Travel Disruption Pandemic Reverse Stress Civil Unrest From one model you can extract specific individual scenarios (e.g. travel, pandemic, civil unrest) but also “all” explanations of a particular region of outcomes (i.e. reverse stress)

Business Performance Holistic map of business enhances understanding of interrelationships between risks and provides candidate model structure Business Performance Indicators serve as ex-post indicators, we can then use the model to explain the drivers underlying business outcomes Risk Register entries help explain uncertainty associated with business processes Using causal model as a way to create multivariate perspectives on performance, in a manner that you can use for a wide range of uses: conditional forecasts, explaining observed outcomes, etc. KPIs inform the current state of the system

Policyholder Behaviour Model based on expert judgement and calibrations to observed real-world incidents. Enables exploration of non-linear behaviours, transitions between states, exploration of outcomes that are suspected but have not really been seen in historical data. Very useful for setting modelling assumptions and planning assumptions where data is sparse

The challenges

Challenge 1: Resistance to Bayes’ subjective probabilities “.. even if I accept the calculations are ‘correct’ I don’t accept subjective priors” There is no such thing as a truly objective frequentist approach <CLICK> The notion of elicting and using subjective judgement is anethema to many experts. This comes back to the old chestnut of frequentist vs subjective probability. <CLICK> But there is …Even the coin toss requires subjective assumptions. The difference with bayesians is that they are honest enough to state all their subjective assumptions. Once you have incorporatred the subjective assumptions bayes is the only rational way of combining the probabilities and revising results in the light of new evidence.

Challenge 2: Building realistic models Common method: Structure and probability tables all learnt from data only (‘machine learning’) DOES NOT WORK EVEN WHEN WE HAVE LOTS OF ‘RELEVANT’ DATA!!!!!!!!!!!!!!!

A typical data-driven study Age Delay in arrival Injury type Brain scan result Arterial pressure Pupil dilation Outcome (death y/n) 17 25 A N L Y 39 20 B M 23 65 21 80 C H 68 22 30 … .. In a typical data driven approach we have observations from a large number of patients – in the example here taken from a study attempting to build a model to predict at risk patients in A&E with head injuries. We have a bunch of variables representing observable factors about the patient and a record of the outcome. The idea is we want to use the date to learn a model to help identify patients most at risk of death.

A typical data-driven study Delay in arrival Injury type Brain scan result Arterial pressure Pupil dilation Age Outcome Purely data driven machine learning algorithms will be inaccurate and produce counterintuitive results e.g. outcome more likely to be OK in the worst scenarios Typical machine learning approaches might learn a model like this but the are inevitably inaccurate – often producing counterintuitive results like outcome OK in the worst scenarios.

Causal model with intervention Delay in arrival Injury type Brain scan result Arterial pressure Pupil dilation Age Danger level TREATMENT What the model was missing were crucial variables like danger level and treatment. Especially at risk patients are of course more likely to get urgent treatment to avoid worst outcomes. By relying on the data available rather than data that is necessary I continue to see very poor BN models learnt from data. Suhc models in fact perform no better than any of the other multitude of ML models ranging from regression models through to NNs. Outcome ..crucial variables missing from the data

Challenge 2: Building realistic models Need to incorporate experts judgment: Structure informed by experts, probability tables learnt from data Structure and tables built by experts Fenton NE, Neil M, and Caballero JG, "Using Ranked nodes to model qualitative judgements in Bayesian Networks“, IEEE TKDE 19(10), 1420-1432, Oct 2007

Challenge 3: Handling continuous nodes Static discretisation: inefficient and devastatingly inaccurate Our developments in dynamic discretisation starting to have a revolutionary effect Neil, M., Tailor, M., & Marquez, D. (2007). “Inference in hybrid Bayesian networks using dynamic discretization”. Statistics and Computing, 17(3), 219–233. Neil, M., Tailor, M., Marquez, D., Fenton, N. E., & Hearty, P. (2008). “Modelling dependable systems using hybrid Bayesian networks”. Reliability Engineering and System Safety, 93(7), 933–939 <CLICK> Imagine two of your variables in an insurance risk model were ‘number of transactions’ (a possible cause) and ‘size of loss in punds’ (effect variable). Imagine having to decide in advance appropriate discretisation intervals for each of these variables. There has to be a trade-oof between number of intervals and computational efficiency. You end up having to predict where the regions of highest probability mass lie and discretising most heavily in those. But then whe you get observations outside of those regions everyhtng collapses. Ever decreasing circles. The only analytic solution involves assuming nomrla distributions under very restrictive structural constraints. But it’s hopeless. This problem, more than anything else has driven modellers to shun real BN modelling and tools in favour of Mont Carlo simulation methods and tools like WinBugs. <CLICK> But recent developments enable you to simply specifcy the whole range and have the model do the discretisations dynamically. Efficient and accurate. Will demo outside.

Challenge 4: Risk Aggregation Estimate sum of a collection of financial assets or events, where each asset or event is modelled as a random variable Methods not designed to cope with the presence of Discrete Causally Connected Random Variables Solution: Bayesian Factorization and Elimination (BFE) algorithm - exploits advances in BNs and is as accurate on conventional problems as competing methods. Competing methods: FFT, Panjer's recursion and MC Peng Lin, Martin Neil and Norman Fenton (2014). “Risk aggregation in the presence of discrete causally connected random variables”. Annals of Actuarial Science, 8, pp 298-319

Genuine risk assessment requires causal Bayesian networks Conclusions Genuine risk assessment requires causal Bayesian networks Bayesian networks now used effectively in a range of real world problems Must involve experts and not rely only on data No major remaining technical barrier to widespread <CLICK> Indeed subjective approach and Bayes is only rational way to reason under uncertainty and is the only rational way to do risk assessment. <CLICK> BNs in real use have been underreported. They are not just an academic research tool. <CLICK> machine learning does not worl <CLUCK> Many of the traditional genuine barriers have now been removed. Manual model building has been revolutionised by improvements in tool design and advances in methods for generating tables from mimimal user input. The achilles heel of continuous nodes has essentially been fixed. There are issues of computationasl complexity, but these are even worse in alternaitve approaches such as Monte Carlo. So the remaining problems are largely perceptual. To gain trust in Bayes we need visual non math arguments. There should NEVER be any need for discussion about the Bayesian calculations, just as there should not be any need to discuss or challenge say how a calculator is used to compute a long division. Under no circumstances should we assume that decision-makers can do the calculations or understand the way such calculations are done. ONLY CONCE{TUAL PRESENTATION BARRIERS REMAIN I have indicated how BN tools have already been used with some effect. I believe that in 50 years time professionals of all types icluding those in insurance, law and even medicine will look back in total disbelief that they could have ignored these available techniques of reasoning about risk for so long.

Try the free BN software and all the models Follow up Get the book www.BayesianRisk.com Try the free BN software and all the models www.AgenaRisk.com Propose case study for ERC Project BAYES-KNOWLEDGE www.eecs.qmul.ac.uk/~norman/projects/B_Knowledge.html