# Bayesian Networks for Risk Assessment

## Presentation on theme: "Bayesian Networks for Risk Assessment"— Presentation transcript:

Bayesian Networks for Risk Assessment
Government Actuary's Department 18 November 2014 Norman Fenton Queen Mary University of London and Agena Ltd In introducing myself I will mention the wide range of areas where we have applies BNs, but make it clear that I will restrict today’s detailed examples to financial type applications.

Outline Overview of Bayes and Bayesian networks Why Bayesian networks are needed for risk assessment Examples and real applications in financial risk Challenges and the future

Our book

Overview of Bayes and Bayesian Networks

A classic risk assessment problem
A particular disease has a 1 in rate of occurrence A screening test for the disease is 100% accurate for those with the disease; 95% accurate for those without What is the probability a person has the disease if they test positive?

Bayes Theorem Have a prior P(H) (“person has disease”)
H (hypothesis) Have a prior P(H) (“person has disease”) E (evidence) Now get some evidence E (“test result positive”) We know P(E|H) So far I have expected you to believe the results of the computations. But what lies behind BNs. <CLICK> Of course it is Bayes theorem that provides the means of formally updating our belief in a hypothesis in the light of new evidence. <CLICK> we start with some hypothesis and a prior probability for it. To illustrate I will use the example of a person having or not having a disease. So for simpliciity this variable has two values T or F. Suppose P(H) is 1/1000 <CLICK> We observe a piece of evidence. Could be diagnostic test result. Again for simplicity assume this outcome is T or F. <CLICK> We know the probability of the evidence given the hypothesis – this is the test accuracy. Suppose eg the test is always pos if a person has the disease P(E|H) = 1 and P(E| not H) is 0.05 but what we really want to know is the revised probability of the hypothesis given the evidence <CLICK> Bayes theorem gives us the necessary formula for this. <CLICK> so it exposes the fallacy in this and similar cases giving us the correct answer. I will alert you here to a problem. Easy for statisticians and for mathematically literate people in this simple case, But for MOST people – and this includes from my personal experience highly intelligent barristers, judges and surgeons – this is completely hopeless. And it is no good us arguing that it is not. But we want the posterior P(H|E) P(H|E) = P(E|H)*P(H) P(E) P(E|H)*P(H) + P(E|not H)*P(not H) = 1*0.001 1* *0.999 P(H|E) = 0.001 0.5005 0.02

A Classic BN Key thing is that as soon as we have multiple related variable the Bayes calculations become a problem

Intractable even for small BNs
Bayesian Propagation Applying Bayes theorem to update all probabilities when new evidence is entered Intractable even for small BNs Breakthrough in late 1980s - fast algorithms Tools implement efficient propagation So we need a method for doing the calculations across a complex model. This is called Bayesian propagation Now Bayes Theorem represents the reasoning needed when there is a single link between two uncertain nodes. In a BBN you have multiple links and when you enter pieces of evidence you update all of the probabilities in the BBN by applying Bayes theorem recursively. This is called PROPAGATION. This is what you saw happen when I entered observations into the example BBN. Although the underlying theory of Bayes has been around for a long time it turns out that propagation is computatonally intractable even for small BBNs, so until very recently nobody could build and execute realistic BBN models even though everybody knew they were a great formalism for handling uncertainty. Fortunately in the late 1980s researchers devleoped algorithms which meant that many classes of very large BBNs could be propogated efficiently. Software tools that implement these algorithms such as the tool I just used have since become available commercially and this has led to an explosion of interest.

A Classic BN: Marginals

Dyspnoea observed

Also non-smoker

Positive x-ray

..but recent visit to Asia

The power of BNs Explicitly model causal factors
Reason from effect to cause and vice versa ‘Explaining away’ Overturn previous beliefs Make predictions with incomplete data Combine diverse types of evidence Visible auditable reasoning Can deal with high impact low probability events (we do not require massive datasets) <CLICK> Explicitly model causal factors <CLICK> Unlike traditional statistical models we can Reason from effect to cause and vice versa <CLICK> ‘Explaining away’ In meteor example when we removed the option of underground cities the observation of no loss of life was explained away – it almost certainly had to be due to blowing up the meteor. <CLICK> Overturn previous beliefs in the light of new evidence – we have seen how a single piece of evidence can have a dramatic impact <CLICK> Make predictions with incomplete data: In fact the model starts to make predictions with NO data at all. <CLICK> Combine diverse types of evidence including both subjective beliefs and objective data <CLICK> Arrive at decisions based on visible auditable reasoning

Why causal Bayesian networks are needed for risk assessment

Problems with regression driven ‘risk assessment’
Irrational for risk assessment Rational for risk assessment

‘Standard’ definition of risk
“An event that can have negative consequences” Measured (or even defined by): Similarly heat maps Used for risk registers – but produce counterintuitive results – the more you think carefully about risk the more risky the project!

..but this does not tell us tell us what we need to know
Armageddon risk: Large meteor strikes the Earth How do we get the probability number? Meteor on direct course to earth so should it be 1? Clearly it needs to be conditioned on other events and actions but where are these in the ‘model’? How do we get the impact number? Clearly massive – or is it? Risk score meaningless: suggests no point in Bruce Willis doing anything at all. Does not tell us what we need to know: how do we best avoid massive loss of life … is it worth trying… The ‘standard approach’ makes no sense at all

Risk using causal analysis
A risk is an event that can be characterised by a causal chain involving (at least): The event itself At least one consequence event that characterises the impact One or more trigger (i.e. initiating) events One or more control events which may stop the trigger event from causing the risk event One or more mitigating events which help avoid the consequence event (for risk) A risk (and, similarly, an opportunity) is an event that can be characterised by a causal chain involving (at least): the event itself at least one consequence event that characterises the impact (so this will be something negative for a risk event and positive for an opportunity event) one or more trigger (i.e. initiating) events one or more control events which may stop the trigger event from causing the risk event (for risk) or impediment events (for opportunity) one or more mitigating events which help avoid the consequence event (for risk) or impediment events (for opportunity

Bayesian Net with causal view of risk
Meteor on collision course with Earth Trigger Blow up Meteor Control Build Underground cities Mitigant Meteor strikes Earth Risk event A risk is therefore characterised by a set of uncertain events. Each of these events has a set of outcomes. For simplicity we assume that these events have two outcomes -- true and false (in practice we can extend the outcomes to incorporate more states). So, for example, “Loss of life” here means loss of at least 80% of the world population. The ‘uncertainty’ associated with a risk is not a separate notion (as assumed in the classic approach). Every event (and hence every object associated with risk) has uncertainty that is characterised by the event’s probability distribution. The sensible risk measures that we are proposing are simply the probabilities you get from running a risk map. Of course, before you can run a risk map you still have to provide some probability values. But, in contrast to the classic approach, the probability values you need to supply are relatively simple and they make sense. And you never have to define vague numbers for ‘impact’. Loss of Life Consequence

Examples and real applications in financial risk

Causal Risk Register Note that ‘common causes’ are easily modelled

Assumes capital sum \$100m and a 10-month loan
Simple stress test interest payment example Assumes capital sum \$100m and a 10-month loan Expected value of resulting payment is \$12m with 95% percentile at \$26m Regulator stress test: “at least 4% interest rate”

Simple stress test interest payment example
Expected value of resulting payment in stress testing scenario is \$59m with 95% percentile at \$83m This model can be built in a couple of minutes with AgenaRisk

Stress testing with causal dependency

Stress testing with causal dependency

Op Risk Loss Event Model

Operational Risk VAR Models
Aggregate scenario outcome Contributing outcomes Scenario dynamics

Stress and Scenario Modelling
Travel Disruption Pandemic Reverse Stress Civil Unrest From one model you can extract specific individual scenarios (e.g. travel, pandemic, civil unrest) but also “all” explanations of a particular region of outcomes (i.e. reverse stress)

Business Performance Holistic map of business enhances understanding of interrelationships between risks and provides candidate model structure Business Performance Indicators serve as ex-post indicators, we can then use the model to explain the drivers underlying business outcomes Risk Register entries help explain uncertainty associated with business processes Using causal model as a way to create multivariate perspectives on performance, in a manner that you can use for a wide range of uses: conditional forecasts, explaining observed outcomes, etc. KPIs inform the current state of the system

Policyholder Behaviour
Model based on expert judgement and calibrations to observed real-world incidents. Enables exploration of non-linear behaviours, transitions between states, exploration of outcomes that are suspected but have not really been seen in historical data. Very useful for setting modelling assumptions and planning assumptions where data is sparse

The challenges

Challenge 1: Resistance to Bayes’ subjective probabilities
“.. even if I accept the calculations are ‘correct’ I don’t accept subjective priors” There is no such thing as a truly objective frequentist approach <CLICK> The notion of elicting and using subjective judgement is anethema to many experts. This comes back to the old chestnut of frequentist vs subjective probability. <CLICK> But there is …Even the coin toss requires subjective assumptions. The difference with bayesians is that they are honest enough to state all their subjective assumptions. Once you have incorporatred the subjective assumptions bayes is the only rational way of combining the probabilities and revising results in the light of new evidence.

Challenge 2: Building realistic models
Common method: Structure and probability tables all learnt from data only (‘machine learning’) DOES NOT WORK EVEN WHEN WE HAVE LOTS OF ‘RELEVANT’ DATA!!!!!!!!!!!!!!!

A typical data-driven study
Age Delay in arrival Injury type Brain scan result Arterial pressure Pupil dilation Outcome (death y/n) 17 25 A N L Y 39 20 B M 23 65 21 80 C H 68 22 30 .. In a typical data driven approach we have observations from a large number of patients – in the example here taken from a study attempting to build a model to predict at risk patients in A&E with head injuries. We have a bunch of variables representing observable factors about the patient and a record of the outcome. The idea is we want to use the date to learn a model to help identify patients most at risk of death.

A typical data-driven study
Delay in arrival Injury type Brain scan result Arterial pressure Pupil dilation Age Outcome Purely data driven machine learning algorithms will be inaccurate and produce counterintuitive results e.g. outcome more likely to be OK in the worst scenarios Typical machine learning approaches might learn a model like this but the are inevitably inaccurate – often producing counterintuitive results like outcome OK in the worst scenarios.

Causal model with intervention
Delay in arrival Injury type Brain scan result Arterial pressure Pupil dilation Age Danger level TREATMENT What the model was missing were crucial variables like danger level and treatment. Especially at risk patients are of course more likely to get urgent treatment to avoid worst outcomes. By relying on the data available rather than data that is necessary I continue to see very poor BN models learnt from data. Suhc models in fact perform no better than any of the other multitude of ML models ranging from regression models through to NNs. Outcome ..crucial variables missing from the data

Challenge 2: Building realistic models
Need to incorporate experts judgment: Structure informed by experts, probability tables learnt from data Structure and tables built by experts Fenton NE, Neil M, and Caballero JG, "Using Ranked nodes to model qualitative judgements in Bayesian Networks“, IEEE TKDE 19(10), , Oct 2007

Challenge 3: Handling continuous nodes
Static discretisation: inefficient and devastatingly inaccurate Our developments in dynamic discretisation starting to have a revolutionary effect Neil, M., Tailor, M., & Marquez, D. (2007). “Inference in hybrid Bayesian networks using dynamic discretization”. Statistics and Computing, 17(3), 219–233. Neil, M., Tailor, M., Marquez, D., Fenton, N. E., & Hearty, P. (2008). “Modelling dependable systems using hybrid Bayesian networks”. Reliability Engineering and System Safety, 93(7), 933–939 <CLICK> Imagine two of your variables in an insurance risk model were ‘number of transactions’ (a possible cause) and ‘size of loss in punds’ (effect variable). Imagine having to decide in advance appropriate discretisation intervals for each of these variables. There has to be a trade-oof between number of intervals and computational efficiency. You end up having to predict where the regions of highest probability mass lie and discretising most heavily in those. But then whe you get observations outside of those regions everyhtng collapses. Ever decreasing circles. The only analytic solution involves assuming nomrla distributions under very restrictive structural constraints. But it’s hopeless. This problem, more than anything else has driven modellers to shun real BN modelling and tools in favour of Mont Carlo simulation methods and tools like WinBugs. <CLICK> But recent developments enable you to simply specifcy the whole range and have the model do the discretisations dynamically. Efficient and accurate. Will demo outside.

Challenge 4: Risk Aggregation
Estimate sum of a collection of financial assets or events, where each asset or event is modelled as a random variable Methods not designed to cope with the presence of Discrete Causally Connected Random Variables Solution: Bayesian Factorization and Elimination (BFE) algorithm - exploits advances in BNs and is as accurate on conventional problems as competing methods. Competing methods: FFT, Panjer's recursion and MC Peng Lin, Martin Neil and Norman Fenton (2014). “Risk aggregation in the presence of discrete causally connected random variables”. Annals of Actuarial Science, 8, pp

Genuine risk assessment requires causal Bayesian networks
Conclusions Genuine risk assessment requires causal Bayesian networks Bayesian networks now used effectively in a range of real world problems Must involve experts and not rely only on data No major remaining technical barrier to widespread <CLICK> Indeed subjective approach and Bayes is only rational way to reason under uncertainty and is the only rational way to do risk assessment. <CLICK> BNs in real use have been underreported. They are not just an academic research tool. <CLICK> machine learning does not worl <CLUCK> Many of the traditional genuine barriers have now been removed. Manual model building has been revolutionised by improvements in tool design and advances in methods for generating tables from mimimal user input. The achilles heel of continuous nodes has essentially been fixed. There are issues of computationasl complexity, but these are even worse in alternaitve approaches such as Monte Carlo. So the remaining problems are largely perceptual. To gain trust in Bayes we need visual non math arguments. There should NEVER be any need for discussion about the Bayesian calculations, just as there should not be any need to discuss or challenge say how a calculator is used to compute a long division. Under no circumstances should we assume that decision-makers can do the calculations or understand the way such calculations are done. ONLY CONCE{TUAL PRESENTATION BARRIERS REMAIN I have indicated how BN tools have already been used with some effect. I believe that in 50 years time professionals of all types icluding those in insurance, law and even medicine will look back in total disbelief that they could have ignored these available techniques of reasoning about risk for so long.

Try the free BN software and all the models
Follow up Get the book Try the free BN software and all the models Propose case study for ERC Project BAYES-KNOWLEDGE