EM Algorithm Presented By: Haiguang Li Computer Science Department University of Vermont Fall 2011.

EM Algorithm Presented By: Haiguang Li Computer Science Department University of Vermont Fall 2011

2 Copyright Note: This presentation is based on the paper: Dempster, A.P. Laird, N.M. Rubin, D.B. (1977). "Maximum Likelihood from Incomplete Data via the EM Algorithm". Journal of the Royal Statistical Society. Series B (Methodological) 39 (1): 1– 38. JSTOR 2984875.MR0501537. The section 1 and 4 come from professor Taiwen Yu’s “EM Algorithm”. The section 2, 3, and 6 come from professor Andrew W. Moore’s “Clustering with Gaussian Mixtures”. The section 5 is edited by me. 2

3 Contents 1. Introduction 2. Example  Silly Example 3. Example  Same Problem with Hidden Info 4. Example  Normal Sample 5. EM-Main Body 6. EM-Algorithm Running on GMM

4 Introduction The EM algorithm was explained and given its name in a classic 1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin. They pointed out that the method had been "proposed many times in special circumstances" by earlier authors. EM is typically used to compute maximum likelihood estimates given incomplete samples. The EM algorithm estimates the parameters of a model iteratively. – Starting from some initial guess, each iteration consists of an E step (Expectation step) an M step (Maximization step)

5 Applications Filling in missing data in samples Discovering the value of latent variables Estimating the parameters of HMMs Estimating parameters of finite mixtures Unsupervised learning of clusters …

EM Algorithm Silly Example

EM Algorithm Same Problem with Hidden Info

EM Algorithm Normal Sample

15 Normal Sample   Sampling

16 Maximum Likelihood   Sampling Given x, it is a function of  and  2 We want to maximize it.

17 Log-Likelihood Function Maximize this instead By setting and

18 Max. the Log-Likelihood Function

19 Max. the Log-Likelihood Function

EM Algorithm Main Body

21 Begin with Classification

22 Solve the problem using another method– parametric method

23 Use our model for classification

24 EM Clustering Algorithm

25 E-M

28 What’s K-means?

EM Algorithm EM Running Example

38 References 1. ^ Dempster, A.P.; Laird, N.M.; Rubin, D.B. (1977). "Maximum Likelihood from Incomplete Data via the EM Algorithm". Journal of the Royal Statistical Society. Series B (Methodological) 39 (1): 1–38. JSTOR 2984875.MR0501537. ^Dempster, A.P.Laird, N.M.Rubin, D.B.Journal of the Royal Statistical SocietyJSTOR2984875MR0501537 2. ^ Sundberg, Rolf (1974). "Maximum likelihood theory for incomplete data from an exponential family".Scandinavian Journal of Statistics 1 (2): 49–58. JSTOR 4615553. MR381110. ^JSTOR4615553MR381110 3. ^ a b Rolf Sundberg. 1971. Maximum likelihood theory and applications for distributions generated when observing a function of an exponential family variable. Dissertation, Institute for Mathematical Statistics, Stockholm University.ab 4. ^ a b Sundberg, Rolf (1976). "An iterative method for solution of the likelihood equations for incomplete data from exponential families". Communications in Statistics – Simulation and Computation 5 (1): 55–64.doi:10.1080/03610917608812007. MR443190.abCommunications in Statisticsdoi10.1080/03610917608812007MR443190 5. ^ See the acknowledgement by Dempster, Laird and Rubin on pages 3, 5 and 11. ^ 6. ^ G. Kulldorff. 1961. Contributions to the theory of estimation from grouped and partially grouped samples. Almqvist & Wiksell. ^ 7. ^ a b Anders Martin-Löf. 1963. "Utvärdering av livslängder i subnanosekundsområdet" ("Evaluation of sub-nanosecond lifetimes"). ("Sundberg formula")ab 8. ^ a b Per Martin-Löf. 1966. Statistics from the point of view of statistical mechanics. Lecture notes, Mathematical Institute, Aarhus University. ("Sundberg formula" credited to Anders Martin-Löf).abPer Martin-Löf 9. ^ a b Per Martin-Löf. 1970. Statistika Modeller (Statistical Models): Anteckningar från seminarier läsåret 1969–1970 (Notes from seminars in the academic year 1969-1970), with the assistance of Rolf Sundberg.Stockholm University. ("Sundberg formula")abPer Martin-Löf 10. ^ a b Martin-Löf, P. The notion of redundancy and its use as a quantitative measure of the deviation between a statistical hypothesis and a set of observational data. With a discussion by F. Abildgård, A. P. Dempster, D. Basu, D. R. Cox, A. W. F. Edwards, D. A. Sprott, G. A. Barnard, O. Barndorff-Nielsen, J. D. Kalbfleisch and G. Rasch and a reply by the author. Proceedings of Conference on Foundational Questions in Statistical Inference (Aarhus, 1973), pp. 1–42. Memoirs, No. 1, Dept. Theoret. Statist., Inst. Math., Univ. Aarhus, Aarhus, 1974.abA. P. DempsterD. BasuD. R. CoxA. W. F. EdwardsG. A. BarnardG. Rasch 11. ^ a b Martin-Löf, Per The notion of redundancy and its use as a quantitative measure of the discrepancy between a statistical hypothesis and a set of observational data. Scand. J. Statist. 1 (1974), no. 1, 3–18.ab 12. ^ Wu, C. F. Jeff (Mar. 1983). "On the Convergence Properties of the EM Algorithm". Annals of Statistics 11 (1): 95– 103. doi:10.1214/aos/1176346060. JSTOR 2240463. MR684867. ^Annals of Statisticsdoi10.1214/aos/1176346060JSTOR2240463MR684867 13. ^ a b Neal, Radford; Hinton, Geoffrey (1999). Michael I. Jordan. ed. "A view of the EM algorithm that justifies incremental, sparse, and other variants". Learning in Graphical Models (Cambridge, MA: MIT Press): 355–368. ISBN 0262600323. Retrieved 2009-03-22.abHinton, GeoffreyMichael I. Jordan"A view of the EM algorithm that justifies incremental, sparse, and other variants"ISBN0262600323 14. ^ a b Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2001). "8.5 The EM algorithm". The Elements of Statistical Learning. New York: Springer. pp. 236–243. ISBN 0-387-95284-5.abISBN0-387-95284-5 15. ^ Jamshidian, Mortaza; Jennrich, Robert I. (1997). "Acceleration of the EM Algorithm by using Quasi-Newton Methods". Journal of the Royal Statistical Society: Series B (Statistical Methodology) 59 (2): 569–587.doi:10.1111/1467-9868.00083. MR1452026. ^Journal of the Royal Statistical Societydoi10.1111/1467-9868.00083MR1452026 16. ^ Meng, Xiao-Li; Rubin, Donald B. (1993). "Maximum likelihood estimation via the ECM algorithm: A general framework". Biometrika 80 (2): 267–278. doi:10.1093/biomet/80.2.267. MR1243503. ^Rubin, Donald B.Biometrikadoi10.1093/biomet/80.2.267MR1243503 17. ^ Hunter DR and Lange K (2004), A Tutorial on MM Algorithms, The American Statistician, 58: 30-37 ^A Tutorial on MM Algorithms

39 The End Thanks very much!

40 Question #1 What are the main advantages of parametric methods? – You can easily change the model to adapt to different distribution of data sets. – Knowledge representation is very compact. Once the model selected, the model is represented by a specific number of parameters. The number of parameters does not increase with the increasing of training data.

41 Question #2 What are the EM algorithm initialization methods? – Random guess. – Initialized by k-means. After a few iterations of k- means, using the parameters to initialize EM.

42 Question #3 What are the differences between EM and K- means? – K-means is a simplified EM. – K-means make a hard decision while EM make a soft decision when update the parameters of the model.

EM Algorithm Presented By: Haiguang Li Computer Science Department University of Vermont Fall 2011.

Similar presentations

Presentation on theme: "EM Algorithm Presented By: Haiguang Li Computer Science Department University of Vermont Fall 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EM Algorithm Presented By: Haiguang Li Computer Science Department University of Vermont Fall 2011.

Similar presentations

Presentation on theme: "EM Algorithm Presented By: Haiguang Li Computer Science Department University of Vermont Fall 2011."— Presentation transcript:

Similar presentations

About project

Feedback