Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.

Slides:

Advertisements

Similar presentations

Unsupervised Learning

Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Least squares CS1114

CS479/679 Pattern Recognition Dr. George Bebis

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Visual Recognition Tutorial

EE-148 Expectation Maximization Markus Weber 5/11/99.

Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.

Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.

Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.

Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.

First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.

Lecture 5: Learning models using EM

Problem Sets Problem Set 3 –Distributed Tuesday, 3/18. –Due Thursday, 4/3 Problem Set 4 –Distributed Tuesday, 4/1 –Due Tuesday, 4/15. Probably a total.

Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.

Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Clustering Color/Intensity

Statistical Inference and Regression Analysis: GB Professor William Greene Stern School of Business IOMS Department Department of Economics.

Expectation Maximization Algorithm

Visual Recognition Tutorial

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Introduction to Bayesian Parameter Estimation

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Introduction and Motivation Approaches for DE: Known model → parametric approach: p(x;θ) (Gaussian, Laplace,…) Unknown model → nonparametric approach Assumes.

EM and expected complete log-likelihood Mixture of Experts

Model Inference and Averaging

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

EECS 274 Computer Vision Segmentation by Clustering II.

Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

Computer Vision Lecture 6. Probabilistic Methods in Segmentation.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Univariate Gaussian Case (Cont.)

Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.

Univariate Gaussian Case (Cont.)

CS479/679 Pattern Recognition Dr. George Bebis

Chapter 3: Maximum-Likelihood Parameter Estimation

Oliver Schulte Machine Learning 726

Deep Feedforward Networks

Model Inference and Averaging

Classification of unlabeled data:

Hidden Markov Models Part 2: Algorithms

دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry

More about Posterior Distributions

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

CS5112: Algorithms and Data Structures for Applications

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

LECTURE 09: BAYESIAN LEARNING

LECTURE 07: BAYESIAN ESTIMATION

Parametric Methods Berlin Chen, 2005 References:

Learning From Observed Data

Mathematical Foundations of BME

Presentation transcript:

Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data

Today’s topic  Decisions based on densities  Density estimation from data –Parametric (Gaussian) –Non-parametric –Non-parametric mode finding 2

Decisions from density  Suppose we have a measure of an ECG –Different responses for shockable and normal  Let’s call our measure M (Measurement) –M(ECG) is a number –Tends to be 1 for normal sinus rhythm (NSR), 0 for anything else We’ll assume non-NSR needs shocking –It’s reasonable to guess that M might be a Gaussian (why?) –Knowing how M looks allows decisions Even ones with nice optimality properties 3

What densities do we need?  Consider the values of M we get for patients with NSR –This density, peaked around 1: f1(x) = Pr(M(ECG) = x | ECG is NSR) –Similarly for not NSR, peaked around 0: f0(x) = Pr(M(ECG) = x | ECG is not NSR)  What might these look like? 4

Densities 5 f1(x) = Pr(M(ECG) = x | ECG is NSR) f0(x) = Pr(M(ECG) = x | ECG is not NSR) Value of M Probability

Obvious decision rule 6

Choice of models  Consider two possible models –E.g., Gaussian vs uniform  Each one is a probability density  More usefully, each one gives each possible observation a probability –E.g., for uniform model they are all the same  This means that each model assigns a probability to what you actually saw –A model that almost never predicts what you saw is probably a bad one! 7

Maximum likelihood choice  The likelihood of a model is the probability that it generated what you observed  Maximum likelihood: pick the model where this is highest  This justifies the obvious decision rule!  Can derive this in terms of a loss function, which specifies the cost to be incorrect –If you guess X and the answer is Y, the loss typically depends on |X-Y| 8

Parameter estimation  Generalization of model choice –A Gaussian is an infinite set of models! 9 Model Parameters Θ Probability of observations

Summary  We start with some training data (ECG’s that we know are NSR or not) –We can compute M on each ECG –Get the densities f0, f1 from this data How? We’ll see shortly  Based on these densities we can classify a new test ECG –Pick the hypothesis with the highest likelihood 10

Sample data from models 11

Density estimation  How do we actually estimate a density from data?  Loosely speaking, if we histogram “enough” data we should get the density –But this depends on bin size –Sparsity is a killer issue in high dimension  What if we know the result is a Gaussian? –Only two parameters (center, width)  Estimating the density means estimating the parameters of the Gaussian 12

ML density estimation  Among all the infinite number of Gaussians that might have generated the observed data, we can easily compute the one with the highest likelihood! –What are the right parameters? –There is actually a nice closed-form solution! 13

Harder cases  What if your data came from two Gaussians with unknown parameters? –This is like our ECG example if the training data was not labeled  If you knew which data point came from which Gaussian, could solve exactly –See previous slide!  Similarly, if you knew the Gaussian parameters, you could tell which data point came from which Gaussian 14

Simplify: intermixed lines  What about cases like this? –Can we find both lines? –“Hidden” variable: which line owns a point

Lines and ownership  Need to know lines and “ownership” –Which point belongs to which lines –We’ll use partial ownership (non-binary)  If we knew the lines we could apportion the ownership –A point is primarily owned by closest line  If we knew ownership we could find the line via weighted LS –Weights from ownership

Solution: guess and iterate  Put two lines down at random  Compute ownership given lines –Each point divides its ownership among the lines, mostly to the closest line  Compute lines given ownership –Weighted LS: weights rise with ownership  Iterate between these

Ownership from lines Evenly split Mostly green Mostly red

Lines from ownership Medium weight High weight Low weight

Convergence

Questions about EM  Does this converge? –Not obvious that it won’t cycle between solutions, or run forever  Does the answer depend on the starting guess? –I.e., is this non-convex?