Advanced Artificial Intelligence Lecture 8: Advance machine learning.

Slides:



Advertisements
Similar presentations
Mixture Models and the EM Algorithm
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Expectation Maximization
An Overview of Machine Learning
Segmentation and Fitting Using Probabilistic Methods
Machine Learning and Data Mining Clustering
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Principal Component Analysis
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Announcements Project 2 more signup slots questions Picture taking at end of class.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Unsupervised Learning
Bayesian belief networks 2. PCA and ICA
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Computer Vision Segmentation Marc Pollefeys COMP 256 Some slides and illustrations from D. Forsyth, T. Darrel,...
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Computer Vision - A Modern Approach Set: Segmentation Slides by D.A. Forsyth Segmentation and Grouping Motivation: not information is evidence Obtain a.
Summarized by Soo-Jin Kim
Segmentation Techniques Luis E. Tirado PhD qualifying exam presentation Northeastern University.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
EM and model selection CS8690 Computer Vision University of Missouri at Columbia Missing variable problems In many vision problems, if some variables.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
CSE 185 Introduction to Computer Vision Face Recognition.
Clustering Features in High-Throughput Proteomic Data Richard Pelikan (or what’s left of him) BIOINF 2054 April
Made by: Maor Levy, Temple University Many data points, no labels 2.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Clustering Usman Roshan CS 675. Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
4.0 - Data Mining Sébastien Lemieux Elitra Canada Ltd.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
K-Means Segmentation.
Dimensionality Reduction
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
LECTURE 10: DISCRIMINANT ANALYSIS
Principal Component Analysis (PCA)
Advanced Artificial Intelligence
Probabilistic Models with Latent Variables
Principal Component Analysis
Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
INTRODUCTION TO Machine Learning
Feature space tansformation methods
LECTURE 09: DISCRIMINANT ANALYSIS
Biointelligence Laboratory, Seoul National University
EM Algorithm and its Applications
Presentation transcript:

Advanced Artificial Intelligence Lecture 8: Advance machine learning

Outline  Clustering  K-Means  EM  Spectral Clustering  Dimensionality Reduction 2

The unsupervised learning problem 3 Many data points, no labels

K-Means 4 Many data points, no labels

K-Means  Choose a fixed number of clusters  Choose cluster centers and point-cluster allocations to minimize error  can’t do this by exhaustive search, because there are too many possible allocations.  Algorithm  fix cluster centers; allocate points to closest cluster  fix allocation; compute best cluster centers  x could be any set of features for which we can compute a distance (careful about scaling)

K-Means

* From Marc Pollefeys COMP

K-Means  Is an approximation to EM  Model (hypothesis space): Mixture of N Gaussians  Latent variables: Correspondence of data and Gaussians  We notice:  Given the mixture model, it’s easy to calculate the correspondence  Given the correspondence it’s easy to estimate the mixture models

Expectation Maximzation: Idea  Data generated from mixture of Gaussians  Latent variables: Correspondence between Data Items and Gaussians

Generalized K-Means (EM)

Gaussians 11

ML Fitting Gaussians 12

Learning a Gaussian Mixture (with known covariance) M-Step E-Step

Expectation Maximization  Converges!  Proof [Neal/Hinton, McLachlan/Krishnan] :  E/M step does not decrease data likelihood  Converges at local minimum or saddle point  But subject to local minima

Practical EM  Number of Clusters unknown  Suffers (badly) from local minima  Algorithm:  Start new cluster center if many points “unexplained”  Kill cluster center that doesn’t contribute  (Use AIC/BIC criterion for all this, if you want to be formal) 15

Spectral Clustering 16

Spectral Clustering 17

Spectral Clustering: Overview Data SimilaritiesBlock-Detection

Eigenvectors and Blocks  Block matrices have block eigenvectors:  Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02] eigensolver = 2 2 = 2 3 = 0 4 = eigensolver = = = = -0.02

Spectral Space  Can put items into blocks by eigenvectors:  Resulting clusters independent of row ordering: e1e1 e2e2 e1e1 e2e e1e1 e2e2 e1e1 e2e2

The Spectral Advantage  The key advantage of spectral clustering is the spectral space representation:

Measuring Affinity Intensity Texture Distance

Scale affects affinity

Dimensionality Reduction 25

Dimensionality Reduction with PCA 26

Linear: Principal Components  Fit multivariate Gaussian  Compute eigenvectors of Covariance  Project onto eigenvectors with largest eigenvalues 27