WSEAS AIKED, Cambridge, 20101 Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.

Slides:



Advertisements
Similar presentations
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Bayesian Estimation in MARK
Bayesian statistics – MCMC techniques
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Chapter 7 Introduction to Sampling Distributions
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
Assuming normally distributed data! Naïve Bayes Classifier.
CS 589 Information Risk Management 30 January 2007.
Ensemble Learning: An Introduction
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 6-1 Introduction to Statistics Chapter 7 Sampling Distributions.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Today Introduction to MCMC Particle filters and MCMC
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Bayesian Learning Rong Jin.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Applied Bayesian Analysis for the Social Sciences Philip Pendergast Computing and Research Services Department of Sociology
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Ensemble Learning (2), Tree and Forest
Review of normal distribution. Exercise Solution.
Bayes Factor Based on Han and Carlin (2001, JASA).
Statistical inference: confidence intervals and hypothesis testing.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
STA291 Statistical Methods Lecture 16. Lecture 15 Review Assume that a school district has 10,000 6th graders. In this district, the average weight of.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
General Principle of Monte Carlo Fall 2013 By Yaohang Li, Ph.D.
Chap 6-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 6 Introduction to Sampling.
Exam I review Understanding the meaning of the terminology we use. Quick calculations that indicate understanding of the basis of methods. Many of the.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 10. Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributes Jean-Hugues Chauchat and Ricco.
Day 2 Review Chapters 5 – 7 Probability, Random Variables, Sampling Distributions.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Computer Science, Software Engineering & Robotics Workshop, FGCU, April 27-28, 2012 Fault Prediction with Particle Filters by David Hatfield mentors: Dr.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Perceptual Multistability as Markov Chain Monte Carlo Inference.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Managerial Economics Demand Estimation & Forecasting.
CLASSIFICATION: Ensemble Methods
Molecular Systematics
1 3. M ODELING U NCERTAINTY IN C ONSTRUCTION Objective: To develop an understanding of the impact of uncertainty on the performance of a project, and to.
26134 Business Statistics Tutorial 12: REVISION THRESHOLD CONCEPT 5 (TH5): Theoretical foundation of statistical inference:
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Ensemble Methods in Machine Learning
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Multi-target Detection in Sensor Networks Xiaoling Wang ECE691, Fall 2003.
Sampling Theory and Some Important Sampling Distributions.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Lecture 5 Introduction to Sampling Distributions.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Tutorial I: Missing Value Analysis
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chapter 9 Estimation using a single sample. What is statistics? -is the science which deals with 1.Collection of data 2.Presentation of data 3.Analysis.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Lecture 28 Section 8.3 Fri, Mar 4, 2005
Reasoning Under Uncertainty in Expert System
Multidimensional Integration Part I
Appetizer: Statistical methods in XSPEC
Lecture 15 Sampling.
CS 188: Artificial Intelligence Fall 2008
CS639: Data Management for Data Science
Presentation transcript:

WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten Maple Department of Computer Science and Technology University of Bedfordshire

WSEAS AIKED, Cambridge, Outline  EEG assessment of brain maturity  Why Bayesian Model Averaging (BMA) for the assessment?  Problems in using BMA for assessing brain maturity  Solution: using posterior information about features  Computational Experiments  Conclusions

WSEAS AIKED, Cambridge, EEG assessment of brain maturity  Newborn brain dismaturity alerts about neurophysiologic abnormality  Experts can assess newborn brain maturity by estimating a newborns age from an EEG recording  The accuracy of such estimate is usually two weeks  Brain maturity is assessed as normal if the newborn’s physical age is within the range of EEG-estimated ages; otherwise the maturity is assessed as abnormal

WSEAS AIKED, Cambridge, … EEG assessment of brain maturity: EEG examples for different ages 28 weeks 36 weeks 40 weeks 20 s

WSEAS AIKED, Cambridge, BMA for brain maturity assessment  Bayesian Model Averaging (BMA), in theory, provides the most accurate assessments and estimates of uncertainty  In practice, Markov Chain Monte Carlo (MCMC) is used to approximate the posterior distribution by taking random samples

WSEAS AIKED, Cambridge, …BMA for brain maturity assessment: exploring the posterior probability  An idea behind BMA is to average over multiple models diverse in their parameters  To ensure unbiased estimates, the portions of models sampled from the posterior distribution should be proportional to their likelihoods  The assessments will be most accurate, and the variation in models outcomes will be interpreted as the uncertainty in assessment

WSEAS AIKED, Cambridge, Change variable move Change threshold move Combine 2 terminal nodes (death move) Split a terminal node (birth move) …BMA for brain maturity assessment: exploring the posterior probability  The exploration is made with moves chosen with predefined probability during a burn-in phase  Each move changes the model parameters and is accepted or rejected accordingly to Bayes’ rule  During a post burn-in phase, models are collected to be averaged X 1,  1 X 2,  2 X 5,  5 X 3,  3 X 4,  4

WSEAS AIKED, Cambridge, …BMA for brain maturity assessment: lack of prior information causing biased sampling  To collect models proportionally, a model parameter space must be explored in detail  When the model parameter space is large, possible problem is:  Not all areas of PDF are explored, and then the models are disproportionally sampled  Prior information about feature importance helps to reduce a model parameter space

WSEAS AIKED, Cambridge,  However, in our case, no prior information on feature importance is available  The EEG data is represented by spectral features and their statistical characteristics, in total by 72 attributes, some of them make weak contribution  To assess the feature importance, we can use Decision Trees (DTs) for BMA …BMA for brain maturity assessment: lack of prior information causing biased sampling

WSEAS AIKED, Cambridge,  If an attribute was rarely used in DTs included in the ensemble, we assume that this attribute makes a wear contribution  When the number of weak attributes is large, the disproportion in models becomes significant  Our hypothesis is that discarding the models using weak EEG attributes will reduce the negative effect of disproportional sampling Solution: using posterior information about features

WSEAS AIKED, Cambridge, Experiments  A BMA ensemble was collected from DTs learned from EEG data represented by the 72 attributes  We calculated the posterior probability of using each attribute in the DTs  We refined the DT ensemble from those DTs which use weak attributes  For comparison, we rerun BMA on the EEG without the identified weak attributes

WSEAS AIKED, Cambridge, Performance of the BMA on age groups of 40 – 45 weeks  DTs  6 classes  Performance 27.4 ± 8.2  Entropy: ± w44 w45 w 40 w41 w42 w

WSEAS AIKED, Cambridge, Posterior feature importance Spectral powersStatistical characteristics DeltaAlphaDeltaAlpha Posterior

WSEAS AIKED, Cambridge, Performance of BMA with discarded attributes Performance, % 29.0± ±8.2 Entropy 478.3± ± Threshold Threshold 25.8±1.7

WSEAS AIKED, Cambridge, Performance of BMA with the refined ensemble 27.4± ±7.9 Performance, % Entropy 478.3± ± Threshold Threshold

WSEAS AIKED, Cambridge, Performance of BMA with the refined ensemble Performance, % Count

WSEAS AIKED, Cambridge, Conclusions  The larger the number of weak attributes, the greater the negative impact on BMA performance  Reduction of the data dimensionality by discarding of weak attributes enabled improving BMA performance (1.6%) due to reducing a model parameter space  The proposed technique provides comparable improvement in performance (1.8%) without the need of rerunning the BMA

WSEAS AIKED, Cambridge, Acknowledgements This research is funded by the Leverhulme Trust