Part I: Classifier Performance Mahesan Niranjan Department of Computer Science The University of Sheffield & Cambridge Bioinformatics.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Pattern Recognition and Machine Learning
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
On-Line Probabilistic Classification with Particle Filters Pedro Højen-Sørensen, Nando de Freitas, and Torgen Fog, Proceedings of the IEEE International.
Chapter 4: Linear Models for Classification
Computer vision: models, learning and inference
Observers and Kalman Filters
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Lecture 14 – Neural Networks
Pattern Recognition and Machine Learning
Support Vector Machines (and Kernel Methods in general)
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Sérgio Pequito Phd Student
Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.
Machine Learning CMPT 726 Simon Fraser University
Particle Filters for Mobile Robot Localization 11/24/2006 Aliakbar Gorji Roborics Instructor: Dr. Shiri Amirkabir University of Technology.
Comparative survey on non linear filtering methods : the quantization and the particle filtering approaches Afef SELLAMI Chang Young Kim.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Particle Filtering for Non- Linear/Non-Gaussian System Bohyung Han
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Muhammad Moeen YaqoobPage 1 Moment-Matching Trackers for Difficult Targets Muhammad Moeen Yaqoob Supervisor: Professor Richard Vinter.
Outline Separating Hyperplanes – Separable Case
This week: overview on pattern recognition (related to machine learning)
Markov Localization & Bayes Filtering
Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J. S. Kim Biointelligence Laboratory, Seoul National University.
Computer vision: models, learning and inference Chapter 19 Temporal models.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Probabilistic Robotics Bayes Filter Implementations.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Sanjay Patil 1 and Ryan Irwin 2 Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Sanjay Patil 1 and Ryan Irwin 2 Intelligent Electronics Systems, Human and Systems Engineering Center for Advanced Vehicular Systems URL:
Christopher M. Bishop, Pattern Recognition and Machine Learning.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
BCS547 Neural Decoding.
Biointelligence Laboratory, Seoul National University
Linear Models for Classification
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Short Introduction to Particle Filtering by Arthur Pece [ follows my Introduction to Kalman filtering ]
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
The Unscented Kalman Filter for Nonlinear Estimation Young Ki Baik.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
LECTURE 11: Advanced Discriminant Analysis
Sparse Kernel Machines
CH 5: Multivariate Methods
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Probabilistic Robotics
Machine Learning Today: Reading: Maria Florina Balcan
Filtering and State Estimation: Basic Concepts
Pattern Recognition and Machine Learning
Robust Full Bayesian Learning for Neural Networks
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Part I: Classifier Performance Mahesan Niranjan Department of Computer Science The University of Sheffield & Cambridge Bioinformatics Limited

BCS, Exeter, July 2004Mahesan Niranjan2 Relevant Reading Bishop, Neural Networks for Pattern Recognition David Hand, Construction and Assessment of Classification Rules Lovell, et. Al. CUED/F-INFENG/TR.299 Scott et al CUED/F-INFENG/TR.323 reports linked from

BCS, Exeter, July 2004Mahesan Niranjan3 Pattern Recognition Framework

BCS, Exeter, July 2004Mahesan Niranjan4 Two Approaches to Pattern Recognition Probabilistic via explicit modelling of probabilities encountered in Bayes’ formula Parametric form for class boundary and optimise it In some specific cases (often not) both reduce to the same answer

BCS, Exeter, July 2004Mahesan Niranjan5 Pattern Recognition: Simple case O Gaussian Distributions Isotropic Equal Variances Optimal Classifier: Distance to mean Linear Class Boundary

BCS, Exeter, July 2004Mahesan Niranjan6 Distance can be misleading O Mahalanobis Distance Optimal Classifier for this case is Fisher Linear Discriminant

BCS, Exeter, July 2004Mahesan Niranjan7 Support Vector Machines Maximum Margin Perceptron X X X X X X O O O O OO O O O O X X X X X X

BCS, Exeter, July 2004Mahesan Niranjan8 Support Vector Machines Nonlinear Kernel Functions X X X X O O O O O O O X X X X X X O O OO O O O

BCS, Exeter, July 2004Mahesan Niranjan9 Support Vector Machines Computations Quadratic Programming Class boundary defined only by data that lie close to it - support vectors Kernels in data space equal scalar products in higher dimensional space

BCS, Exeter, July 2004Mahesan Niranjan10 Support Vector Machines The Hypes Strong theoretical basis - Computational Learning Theory; complexity controlled by the Vapnik-Chervonenkis dimension Not many parameters to tune High performance on many practical problems, high dimensional problems in particular

BCS, Exeter, July 2004Mahesan Niranjan11 Support Vector Machines The Truths Worst case bounds from Learning theory are not very practical Several parameters to tune –What kernel? –Internal workings of the optimiser –Noise in training data Performance? – depends on who you ask

BCS, Exeter, July 2004Mahesan Niranjan12 SVM: data driven kernel Fisher Kernel [Jaakola & Haussler] –Kernel based on a generative model of all the data

BCS, Exeter, July 2004Mahesan Niranjan13 Classifier Performance Error rates can be misleading –Imbalance in training/test data 98% of population healthy 2% population has disease –Cost of misclassification can change after design of classifier

BCS, Exeter, July 2004Mahesan Niranjan14 x xx x x x x x x x x x Adverse Outcome Benign Outcome Threshold Class Boundary

BCS, Exeter, July 2004Mahesan Niranjan15 True Positive False Positive Area under the ROC Curve: Neat Statistical Interpretation

BCS, Exeter, July 2004Mahesan Niranjan16 Convex Hull of ROC Curves False Positive True Positive

BCS, Exeter, July 2004Mahesan Niranjan17 Yeast Gene Example: MATLAB Demo here

Part II: Particle Filters for Tracking and Sequential Problems Mahesan Niranjan Department of Computer Science The University of Sheffield

BCS, Exeter, July 2004Mahesan Niranjan19 Overview Motivation State Space Model Kalman Filter and Extensions Sequential MCMC Methods –Particle Filter & Variants

BCS, Exeter, July 2004Mahesan Niranjan20 Motivation Neural Networks for Learning: –Function Approximation –Statistical Estimation –Dynamical Systems –Parallel Processing Guarantee Generalisation: –Regularise / control complexity –Cross validate to detect / avoid overfitting –Bootstrap to deal with model / data uncertainty Many of the above tricks won’t work in a sequential setting

BCS, Exeter, July 2004Mahesan Niranjan21 Interesting Applications Speech Signal Processing Medical Signals –Monitoring Liver Transplant Patients Tracking the prices of Options contracts in computational finance

BCS, Exeter, July 2004Mahesan Niranjan22 Good References Bar-Shalom and Fortman: Tracking and Data Association Jazwinski: Stochastic Processes and Filtering Theory Arulampalam et al: “Tutorial on Particle Filters…”; IEEE Transactions on Signal Processing Arnaud Doucet: Technical Report 310, Cambridge University Engineering Department Benveniste, A et al: Adaptive Algorithms and Stochastic Approximation Simon Haykin: Adaptive Filters

BCS, Exeter, July 2004Mahesan Niranjan23 Matrix Inversion Lemma

BCS, Exeter, July 2004Mahesan Niranjan24 Linear Regression

BCS, Exeter, July 2004Mahesan Niranjan25 Recursive Least Squares

BCS, Exeter, July 2004Mahesan Niranjan26 State Space Model State Process Noise ObservationMeasurement Noise

BCS, Exeter, July 2004Mahesan Niranjan27 Simple Linear Gaussian Model

BCS, Exeter, July 2004Mahesan Niranjan28 Kalman Filter Prediction Correction

BCS, Exeter, July 2004Mahesan Niranjan29 Kalman Filter Innovation Kalman Gain

BCS, Exeter, July 2004Mahesan Niranjan30 Bayesian Setting PriorLikelihood Innovation Probability Run Multiple Models and Switch - Bar-Shalom Set Noise Levels to Max Likelihood Values - Jazwinski

BCS, Exeter, July 2004Mahesan Niranjan31 Extended Kalman Filter Lee Ford Successful training of Recurrent Neural Networks Taylor Series Expansion around the operating point First Order Second Order Iterated Extended Kalman Filter

BCS, Exeter, July 2004Mahesan Niranjan32 Iterated Extended Kalman Filter Local Linearization of State and / or Observation Propagation and Update

BCS, Exeter, July 2004Mahesan Niranjan33 Unscented Kalman Filter Generate some points at time So they can represent the mean and covariance: Propagate these through the state equations Recompute predicted mean and covariance:

BCS, Exeter, July 2004Mahesan Niranjan34 Recipe to define: Recompute:

BCS, Exeter, July 2004Mahesan Niranjan35 Formant Tracking Example Linear Filter ExcitationSpeech

BCS, Exeter, July 2004Mahesan Niranjan36 Formant Tracking Example

BCS, Exeter, July 2004Mahesan Niranjan37 Formant Track Example

BCS, Exeter, July 2004Mahesan Niranjan38 Grid-based methods Discretize continuous state into “cells” Integrating probabilities over each partition Fixed partitioning of state space 

BCS, Exeter, July 2004Mahesan Niranjan39 Sampling Methods: Bayesian Inference Parameters Uncertainty over parameters Inference:

BCS, Exeter, July 2004Mahesan Niranjan40 Basic Tool: Composition [Tanner] To generate samples of

BCS, Exeter, July 2004Mahesan Niranjan41 Importance Sampling

BCS, Exeter, July 2004Mahesan Niranjan42 Particle Filters Prediction Weights of Sample Bootstrap Filters ( Gordon et al, Tracking ) CONDENSATION Algorithm ( Isard et al, Vision )

BCS, Exeter, July 2004Mahesan Niranjan43 Sequential Importance Sampling Recursive update of weights Only upto a constant of proportionality

BCS, Exeter, July 2004Mahesan Niranjan44 Degeneracy in SIS Variance of weights monotonically increases  All except one decay to zero very rapidly Effective number of particles Resample if

BCS, Exeter, July 2004Mahesan Niranjan45 Sampling, Importance Re-Sampling (SIR) Multiply samples of high weight; kill off samples in parts of space not relevant  “Particle Collapse”

BCS, Exeter, July 2004Mahesan Niranjan46 Marginalizing Part of the State Space Suppose Possible to analytically integrate with respect to part of the state space Sample with respect to Integrate with respect to Rao-Blackwell

BCS, Exeter, July 2004Mahesan Niranjan47 Variations to the Basic Algorithm Integrate out part of the state space –Rao-Blackwellized particle filters ( e.g. Multi-layer perceptron with linear output layer ) Variational Importance Sampling ( Lawrence et al ) Auxilliary Particle Filters ( Pitt et al ) Regularized Particle Filters Likelihood Particle Filters

BCS, Exeter, July 2004Mahesan Niranjan48 Regularised PF: basic idea Samples Kernel Density Resample Propagate in time

BCS, Exeter, July 2004Mahesan Niranjan49 Conclusion / Summary Collection of powerful algorithms New and interesting signal processing problems