An Introduction to the EM Algorithm Naala Brewer and Kehinde Salau.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Expectation-Maximization (EM) Algorithm Md. Rezaul Karim Professor Department of Statistics University of Rajshahi Bangladesh September 21, 2012.
EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.
1 Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
EM Algorithm Jur van den Berg.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Parameter Estimation using likelihood functions Tutorial #1
. Learning – EM in The ABO locus Tutorial #8 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
Lecture 5: Learning models using EM
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Parametric Inference.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Chapter 6: Probability.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Random Sampling, Point Estimation and Maximum Likelihood.
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
An Introduction to the EM Algorithm By Naala Brewer and Kehinde Salau Project Advisor – Prof. Randy Eubank Advisor – Prof. Carlos Castillo-Chavez MTBI,
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Methodology Solving problems with known distributions 1.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction.
Lecture 2: Statistical learning primer for biologists
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
Maximum Likelihood Estimation
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Conditional Expectation
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Classification of unlabeled data:
Chapter Six Normal Curves and Sampling Probability Distributions
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Latent Variables, Mixture Models and EM
Expectation Maximization Mixture Models HMMs
Introduction to EM algorithm
CS498-EA Reasoning in AI Lecture #20
SMEM Algorithm for Mixture Models
EM for Inference in MV Data
EM for Inference in MV Data
EM Algorithm 主講人:虞台文.
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Learning Bayesian networks
Presentation transcript:

An Introduction to the EM Algorithm Naala Brewer and Kehinde Salau

An Introduction to the EM Algorithm Outline History of the EM Algorithm Theory behind the EM Algorithm Biological Examples including derivations, coding in R, Matlab, C++ Graphs of iterations and convergence

Brief History of the EM Algorithm Method frequently referenced throughout field of statistics Term coined in 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin

Breakdown of the EM Task To compute MLEs of latent variables and unknown parameters in probabilistic models E-step: computes expectation of complete/unobserved data M-step: computes MLEs of unknown parameters Repeat!!

Generalization of the EM Algorithm X- Full sample (latent variable) ~ f(x; θ) Y - Observed sample (incomplete data) ~ f(y;θ) such that y(x) = y We define Q(θ;θ p ) = E[lnf(x;θ)|Y, θ p ] θ p+1 obtained by solving, = 0

Generalization (cont.) Iterations continue until |θ p+1 - θ p | or | Q(θ p+1 ;θ p ) - Q(θ p ;θ p ) | are sufficiently small Thus, optimal values for Q(θ;θ p ) and θ are obtained Likelihood nondecreasing with each iteration: Q(θ p+1 ;θ p ) ≥ Q(θ p ;θ p )

Example 1 - Ecological Example n - number of marked animals in 5 different regions, p - probability of survival Suppose that only the number of animals that survive in 3 of the 5 regions is known (we may not be able to see or capture all of the animals in x 1, x 2 ) X = (?, ?, 30, 25, 39) = (x 1, x 2, x 3, x 4, x 5 ) We estimate p using the EM Algorithm.

Binomial Distribution - Derivation

Binomial Derivation (cont.)

Binomial Distribution Graph of Convergence of Unknown Parameter, p k

Example 2 – Population of Animals Rao (1965, pp ), Genetic Linkage Model Suppose 197 animals are distributed multinomially into four categories, y = (125, 18, 20, 34) = ( y 1, y 2, y 3, y 4 ) A genetic model for the population specifies cell probabilities (1/2+p /4, ¼ – p /4, ¼ – p/4, p/4) Represent y as incomplete data, y 1 =x 1 +x 2 (x 1, x 2 unknown), y 2 =x 3, y 3 =x 4, y 4 =x 5.

Multinomial Distribution-Derivation

Multinomial Derivation (cont.)

Multinomial Coding Example 2 – Population of Animals R Coding Matlab Coding C++ Coding

R Coding #initial vector of data y <- c(125, 18, 20, 34) #Initial value for unknown parameter pik <-.5 for(k in 1:10){ x2k <-y[1]*(.25*pik)/( *pik) pik <- (x2k + y[4])/(x2k + sum(y[2:4])) print(c(x2k,pik)) #Convergent values } Matlab Coding %initial vector of data y = [125, 18, 20, 34]; %Initial value for unknown parameter pik =.5; for k = 1:10 x2k = y(1)*(.25*pik)/( *pik) pik = (x2k + y(4))/(x2k + sum(y(2:4))) end %Convergent values [x2k,pik] Multinomial Coding

C++ Coding #include int main () { int x1, x2, x3, x4; float pik, x2k; std::cout << "enter vector of values, there should be four inputs\n"; std::cin >> x1 >> x2 >> x3 >> x4; std::cout << "enter value for pik\n"; std::cin >> pik; for (int counter = 0; counter < 10; counter++){ x2k = x1*((0.25)*pik)/((0.5) + (0.25)*pik); pik = (x2k + x4)/(x2k + x2 + x3 + x4); std::cout << "x2k is " << x2k << " and " << " pik is " << pik << std::endl; } return 0; } Matlab Coding %initial vector of data y = [125, 18, 20, 34]; %Initial value for unknown parameter pik =.5; for k = 1:10 x2k = y(1)*(.25*pik)/( *pik) pik = (x2k + y(4))/(x2k + sum(y(2:4))) end %Convergent values [x2k,pik] Multinomial Coding

Graphs of Convergence of Unknowns,p k and x 2 k Multinomial Distribution

Example 3 -Failure Times Flury and Zoppè (2000) ▫Suppose the lifetime of bulbs follows an exponential distribution with mean θ ▫The failure times (u 1,...,u n ) are known for n light bulbs ▫In another experiment, m light bulbs (v 1,...,v m ) are tested; no individual recordings  The number of bulbs, r, that fail at time t 0 are recorded

Exponential Distribution - Derivation

Exponential Derivation (cont.)

Example 3 – Failure Times Graphs

Future Work More Elaborate Biological Examples Develop lognormal models with predictive capabilities for optimal interrupted HIV treatments (ref. H.T. Banks); i.e.Normal Mixture models Study of improved models Monte Carlo implementation of the E step Louis' Turbo EM

An Introduction to the EM Algorithm References [1] Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 39, No. 1,, pp [2] Redner, R.A., Walker, H.F. (Apr., 1984). Mixture Densities, Maximum Likelihood and the EM Algorithm. SIAM Review, Vol. 26, No. 2., pp [3] Tanner, A.T. (1996). Tools for Statistical Inference. Springer- Verlag New York, Inc. Third Edition

Acknowledgements The MTBI/SUMS Summer Research Program is supported by:  The National Science Foundation (DMS )  The National Security Agency (DOD-H )  The Sloan Foundation  Arizona State University Our research particularly appreciates:  Dr. Randy Eubank  Dr. Carlos Castillo-Chavez