ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Expectation Maximization –Principal Component Analysis (PCA) Readings:

Slides:

Advertisements

Similar presentations

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Advertisements

Unsupervised Learning

Expectation Maximization

Supervised Learning Recap

Segmentation and Fitting Using Probabilistic Methods

K-means clustering Hongning Wang

Visual Recognition Tutorial

EE-148 Expectation Maximization Markus Weber 5/11/99.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction

Expectation Maximization Algorithm

Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”

Unsupervised Learning

Expectation-Maximization

Visual Recognition Tutorial

ECE 5984: Introduction to Machine Learning

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

1 EM for BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 24 th, 2008 Readings: 18.1, 18.2, –  Carlos Guestrin.

Gaussian Mixture Models and Expectation Maximization.

Crash Course on Machine Learning

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

CSE 185 Introduction to Computer Vision Pattern Recognition.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

Lecture 19: More EM Machine Learning April 15, 2010.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Statistical Estimation (MLE, MAP, Bayesian) Readings: Barber 8.6, 8.7.

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

1 EM for BNs Kalman Filters Gaussian BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 17 th, 2006 Readings: K&F: 16.2 Jordan.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber

Lecture 17 Gaussian Mixture Models and Expectation Maximization

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

1 Clustering: K-Means Machine Learning , Fall 2014 Bhavana Dalvi Mishra PhD student LTI, CMU Slides are based on materials from Prof. Eric Xing,

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819

CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Parameter Learning with Hidden Variables & Expectation Maximization.

Lecture 2: Statistical learning primer for biologists

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.

CSE 446 Dimensionality Reduction and PCA Winter 2012 Slides adapted from Carlos Guestrin & Luke Zettlemoyer.

Flat clustering approaches

CSE 517 Natural Language Processing Winter 2015

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

CSE 446: Expectation Maximization (EM) Winter 2012 Daniel Weld Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer.

Advanced Artificial Intelligence Lecture 8: Advance machine learning.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Classification of unlabeled data:

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Probabilistic Models with Latent Variables

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

INTRODUCTION TO Machine Learning

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Presentation transcript:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Expectation Maximization –Principal Component Analysis (PCA) Readings: Barber

Administrativia Poster Presentation: –May 8 1:30-4:00pm –310 Kelly Hall: ICTAS Building –Print poster (or bunch of slides) –Format: Portrait Make 1 dimen = 36in Board size = 30x40 –Less text, more pictures. (C) Dhruv Batra2 Best Project Prize!

Administrativia Final Exam –When: May 11 7:45-9:45am –Where: In class –Format: Pen-and-paper. –Open-book, open-notes, closed-internet. No sharing. –What to expect: mix of Multiple Choice or True/False questions “Prove this statement” “What would happen for this dataset?” –Material Everything! Focus on the recent stuff. Exponentially decaying weights? Optimal policy? (C) Dhruv Batra3

Recap of Last Time (C) Dhruv Batra4

K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns… 5.…and jumps there 6.…Repeat until terminated! 5(C) Dhruv BatraSlide Credit: Carlos Guestrin

Optimize objective function: Fix , optimize a (or C) –Assignment Step Fix a (or C), optimize  –Recenter Step 6(C) Dhruv BatraSlide Credit: Carlos Guestrin K-means as Co-ordinate Descent

GMM (C) Dhruv Batra7Figure Credit: Kevin Murphy

K-means vs GMM K-Means – ppletKM.htmlhttp://home.deib.polimi.it/matteucc/Clustering/tutorial_html/A ppletKM.html GMM – (C) Dhruv Batra8

EM Expectation Maximization [Dempster ‘77] Often looks like “soft” K-means Extremely general Extremely useful algorithm –Essentially THE goto algorithm for unsupervised learning Plan –EM for learning GMM parameters –EM for general unsupervised learning problems (C) Dhruv Batra9

EM for Learning GMMs Simple Update Rules –E-Step: estimate P(z i = j | x i ) –M-Step: maximize full likelihood weighted by posterior (C) Dhruv Batra10

Gaussian Mixture Example: Start 11(C) Dhruv BatraSlide Credit: Carlos Guestrin

After 1st iteration 12(C) Dhruv BatraSlide Credit: Carlos Guestrin

After 2nd iteration 13(C) Dhruv BatraSlide Credit: Carlos Guestrin

After 3rd iteration 14(C) Dhruv BatraSlide Credit: Carlos Guestrin

After 4th iteration 15(C) Dhruv BatraSlide Credit: Carlos Guestrin

After 5th iteration 16(C) Dhruv BatraSlide Credit: Carlos Guestrin

After 6th iteration 17(C) Dhruv BatraSlide Credit: Carlos Guestrin

After 20th iteration 18(C) Dhruv BatraSlide Credit: Carlos Guestrin

General Mixture Models (C) Dhruv Batra19 P(x | z)P(z)Name GaussianCategorialGMM MultinomialCategoricalMixture of Multinomials CategoricalDirichletLatent Dirichlet Allocation

Mixture of Bernoullis (C) Dhruv Batra20

Marginal likelihood – x is observed, z is missing: The general learning problem with missing data 21(C) Dhruv Batra

Applying Jensen’s inequality Use: log  z P(z) f(z) ≥  z P(z) log f(z) 22(C) Dhruv Batra

Convergence of EM Define potential function F( ,Q): EM corresponds to coordinate ascent on F –Thus, maximizes lower bound on marginal log likelihood 23(C) Dhruv Batra

EM is coordinate ascent E-step: Fix  (t), maximize F over Q: –“Realigns” F with likelihood: 24(C) Dhruv Batra

EM is coordinate ascent M-step: Fix Q (t), maximize F over  Corresponds to weighted dataset: – with weight Q (t+1) (z=1|x 1 ) – with weight Q (t+1) (z=2|x 1 ) – with weight Q (t+1) (z=3|x 1 ) – with weight Q (t+1) (z=1|x 2 ) – with weight Q (t+1) (z=2|x 2 ) – with weight Q (t+1) (z=3|x 2 ) –… 25(C) Dhruv Batra

EM Intuition (C) Dhruv Batra26

What you should know K-means for clustering: –algorithm –converges because it’s coordinate ascent EM for mixture of Gaussians: –How to “learn” maximum likelihood parameters (locally max. like.) in the case of unlabeled data EM is coordinate ascent Remember, E.M. can get stuck in local minima, and empirically it DOES General case for EM 27(C) Dhruv BatraSlide Credit: Carlos Guestrin

Tasks (C) Dhruv Batra28 Classificationxy Regressionxy Discrete Continuous Clusteringxc Discrete ID Dimensionality Reduction xz Continuous

New Topic: PCA

Synonyms Principal Component Analysis Karhunen–Loève transform Eigen-Faces Eigen- PCA is a Dimensionality Reduction Algorithm Other Dimensionality Reduction algorithms –Linear Discriminant Analysis (LDA) –Independent Component Analysis (ICA) –Local Linear Embedding (LLE) –… (C) Dhruv Batra30

Dimensionality reduction Input data may have thousands or millions of dimensions! –e.g., images have 5M pixels Dimensionality reduction: represent data with fewer dimensions –easier learning – fewer parameters –visualization – hard to visualize more than 3D or 4D –discover “intrinsic dimensionality” of data high dimensional data that is truly lower dimensional

Dimensionality Reduction Demo – – (C) Dhruv Batra32

PCA / KL-Transform De-correlation view –Make features uncorrelated –No projection yet Max-variance view: –Project data to lower dimensions –Maximize variance in lower dimensions Synthesis / Min-error view: –Project data to lower dimensions –Minimize reconstruction error All views lead to same solution (C) Dhruv Batra33