Bayesian networks and how they can help us to explore fish species interaction in the Northern gulf of St Lawrence Dr Allan Tucker Centre for Intelligent.

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

Applying Hidden Markov Models to Bioinformatics
Dynamic Bayesian Networks (DBNs)
CSCI 121 Special Topics: Bayesian Networks Lecture #5: Dynamic Bayes Nets.
Supervised Learning Recap
Introduction of Probabilistic Reasoning and Bayesian Networks
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
EE-148 Expectation Maximization Markus Weber 5/11/99.
1 A Framework for Modelling Short, High-Dimensional Multivariate Time Series: Preliminary Results in Virus Gene Expression Data Analysis Paul Kellam 1,
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Parameterising Bayesian Networks: A Case Study in Ecological Risk Assessment Carmel A. Pollino Water Studies Centre Monash University Owen Woodberry, Ann.
Lecture 5: Learning models using EM
Extending Evolutionary Programming to the Learning of Dynamic Bayesian Networks Allan Tucker Xiaohui Liu Birkbeck College University of London.
Learning Dynamic Bayesian Networks with Changing Dependencies Allan Tucker Xiaohui Liu IDA 2003.
Who am I and what am I doing here? Allan Tucker A brief introduction to my research
The Automatic Explanation of Multivariate Time Series (MTS) Allan Tucker.
Visual Recognition Tutorial
Explaining Multivariate Time Series to Detect Early Problem Signs Architectures and Efficient Learning Algorithms for Dynamic Bayesian Networks Allan Tucker,
Learning Bayesian Networks
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Bayesian Classification and Forecasting of Visual Field Deterioration Allan Tucker, Xiaohui Liu; Brunel University David Garway-Heath; Moorfield’s Eye.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Data Mining – Intro.
Probabilistic Prediction Algorithms Jon Radoff Biophysics 101 Fall 2002.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
From Genes to Populations: The Intelligent Data Analysis of Biological Data Allan Tucker School of Information Systems Computing and Mathematics, Brunel.
Computer modelling ecosystem processes and change Lesson 8 Presentation 1.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Anomaly detection with Bayesian networks Website: John Sandiford.
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Naive Bayes Classifier
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
CS Statistical Machine learning Lecture 24
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Lecture 2: Statistical learning primer for biologists
OBJECT TRACKING USING PARTICLE FILTERS. Table of Contents Tracking Tracking Tracking as a probabilistic inference problem Tracking as a probabilistic.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
NTU & MSRA Ming-Feng Tsai
Review of statistical modeling and probability theory Alan Moses ML4bio.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Data statistics and transformation revision Michael J. Watts
Qian Liu CSE spring University of Pennsylvania
Ecosystem-Based Management for the Northeast US Continental Shelf
From Genes to Populations: The Intelligent Data Analysis of
CS 188: Artificial Intelligence Spring 2007
Data Warehousing and Data Mining
Presentation transcript:

Bayesian networks and how they can help us to explore fish species interaction in the Northern gulf of St Lawrence Dr Allan Tucker Centre for Intelligent Data Analysis Brunel University West London UK

Talk Outline Introduce myself and research group Introduce Machine Learning Describe Bayesian network models Document some preliminary results on fish population data Conclusions

Who Am I? Research Lecturer at Brunel University, West London Member of Centre for IDA (est 1994) X

What is the ? Over 25 members (academics, postdocs, and PhDs) with diverse backgrounds (e.g. maths, statistics, computing, biology, engineering) Over 140 journal publications & a dozen research council grants since 2001 Many collaborating partners in UK, Europe, China and USA Bi Annual Symposia in Europe

Some Previous Work in Machine Learning and Temporal Analysis Oil Refinery Models Forecasting Explanation Medical Data: Retinal (Visual Field) Screening Forecasting Bioinformatics: Gene Clusters Gene Regulatory Networks

Some Previous Work in

What is Machine Learning? Part 1

What is Machine Learning? (and why not statistics?) Data oriented Extracting useful info from data As automated as possible Useful when lots of data and little theory Making predictions about the future

What Can we do with ML? Classification and Clustering Feature Selection Prediction and Forecasting Identifying Structure in Data

E.g. Classification Given some labelled data (supervised) Build a “model” to allow us to classify other unlabelled data e.g. A doctor diagnosing a patient based upon previous cases

Classification e.g. medical Scatterplot of patients 2 variables: Measurement of expression of 2 genes

Classification How do we classify them? Nearest Neighbour / Linear / Complex Fn?

Classification Trivial case with Cod and Shrimp Data

The Data Northern Gulf (region a) Two ships (Needler and Hammond) combined by normalising according to overlap year Multivariate Spatial Time Series (short) Missing Data

Background Northern Gulf considered to be one ecosystem / fish community Quite heavily fished until about 1990 Most fish populations collapsed since Some say that moved to an alternative stable state and unlikely to come back to cod dominated community without some chance event beyond human control. Lots of speculation: cold water large increases in population of predators. Examine nature and strength of interactions between species in the two periods. Ask “what if ?” questions: For other parts of community to recover, we would need cod to have X strength of interaction with Y number of other species?

ML for Northern Gulf Data Network building knowledge and data of interactions Feature Selection for Classification of relevant species to the cod collapse State Space / Dynamic models for predicting populations Hidden variable analysis

Bayesian Networks for Machine Learning Part 2

Bayesian Networks Method to model a domain using probabilities Easily interpreted by non-statisticians Can be used to combine existing knowledge with data Essentially use independence assumptions to model the joint distribution of a domain

Bayesian Networks Simple 2 variable Joint Distribution can use it to ask many useful questions but requires k N probabilities Species2¬ Species2 Species ¬ Species P(Collapse1, Collapse2)

Bayesian Network for Toy Domain SpeciesC SpeciesDSpeciesE P(A)P(B) A B P(C) T T.95 T F.94 F T.29 F F.001 C P(E)C P(D) T.70 F.01 T.90 F.05 SpeciesASpeciesB

Bayesian Networks Bayesian Network Demo [Species_Net] Use algorithms to learn structure and parameters from data Or build by hand (priors) Also continuous nodes (density functions)

Informative Priors To build BNs we can also use prior structures and probabilities These are then updated with data Usually uniform (equal probability) Informative Priors used to incorporate existing knowledge into BNs

Bayesian Networks for Classification & Feature Selection Node that represents the class label attached to the data

Dynamic Bayesian Networks for Forecasting Nodes represent variables at distinct time slices Links between nodes over time Can be used to forecast into the future [Species_Dynamic_Net]

Hidden Markov Models Like a DBN but with hidden nodes: Often used to model sequences H T-1 HTHT O T-1 OTOT

Typical Algorithms for HMMs Given an observed sequence and a model, how do we compute its probability given the model? Given the observed sequence and the model, how do we choose an optimal hidden state sequence? How do we adjust the model parameters to maximise the probability of the observed sequence given the model?

Summary Different learning tasks can be used to solve real world problems Machine Learning techniques useful when lots of data and lots of gaps in knowledge Bayesian Networks: probabilistic framework that can perform most key ML tasks Also transparent & can incorporate expert knowledge

Some Preliminary Results on Northern Gulf Data Part 3

Expert Knowledge Ask marine biologists to generate matrices of expected relationships Can be used to compare models learnt from data Also to be used as priors to improve model quality

Results: Expert networks

Results: Data networks (BN from correlation) 85% conf. imputed from 70% data Warning: data quality, spurious relations Cod Haddock Witch Flounder Shrimp (Lumpfish) (Silver Hake) (Atlantic soft pout / Bristlemouths) (Eel pout / Ocean Sun Fish)

Example DBN Let’s look at an example DBN [NGulfDynamic - range] Structure Encoded by knowledge Updated by data Explore with queries Supported by previous knowledge: “In the Northern gulf of st. Lawrence, cod (code 438) and redfish (792,793,794,795,796) collapsed to very low levels in the mid 1990s. Subsequently the shrimp (8111) increased greatly in biomass so one will see this signal in the data. It is hypothesised that these are exclusive community states where you never get high abundance of both at the same time owing to predatory interactions.”

Feature Selection Given that we know that from 1990 the cod population collapsed Can we apply Feature Selection to see what species characterise this collapse [Learn BN and apply CV]

Results 7: Feature Selection with Bootstrap Wrapper method using BNs Filter method using Log Likelihood Redfish

Results : Feature Selection Change in Correlation of interactions between cod and high ranking species before and after 1990:

Dynamic Models Given that the data is a time-series Can we build dynamic models to forecast future states? Can we use HMM to classify the time- series?

Multivariate Time Series N Gulf is process measured over time Autoregressive Correlation Function (here cod) Cross Correlation Function (here hake to cod) ACF CCF

Results 3: Fitting Dynamic Models HMM Expert with CCF > 0.3 (maxlag = 5) LSS =

Results 3: Fitting Dynamic Models Learning DBN from CCF data LSS = Fluctuation: Early Indicator of Collapse?

Results 4: Examining DBN Net Data only Dynamic Links: Cod Hakes Haddock White Hake Redfish Witch Flounder Shrimp Thorny Skate

Results 5: Fitting Dynamic Models Learning DBN from Expert biased CCF data CCF > 0.5 (maxlag=5) LSS =

Results 6: Examining DBN Net Data Biased Expert Dynamic Links: Cod Witch Flounder Herring Mackerel / Capelin

Results 7: Linear Dynamic System Instead of hidden state, continuous var: Could be interpreted as measure of fishing? Predator population (e.g. seals)? Water temperature?

Conclusions Hopefully conveyed the broad idea of machine learning Shown how it can be used to help analyse data like fish population data Potentially applicable to other data studied here at MLI

Potential Projects 1. Spatio-Temporal Analysis Use Spatio-Temporal BNs to model fish stock data. Nodes would represent species in specific “regions” 2. Combining Expert Knowledge and Data for improved Prediction 3. Looking for Un/Stable States and the factors that influence them 4. Machine Learning Techniques for other Data generated here at MLI

E.G. Spatial Analysis Spatial Bayesian Network Analysis [NGulfCodSpatial]

Acknowledgements: Daniel Duplisea for inviting me Any Questions?